paracrawl / Domain_Adaptation

InDomain detection is a tool designed to extract in-domain data from a large collections of data.
GNU General Public License v3.0
1 stars 1 forks source link

Language codes #18

Closed kpu closed 5 years ago

kpu commented 5 years ago

What are the language codes used for? I think they're just for the tokenizer in which case it would be best to say so. Also, ISO 639-1?

dionwiggins commented 5 years ago

Comments added in the definitions.