google / corpuscrawler

Crawler for linguistic corpora
Other
193 stars 55 forks source link

Documentation > Clarify language codes system in uses #83

Closed hugolpz closed 3 years ago

hugolpz commented 3 years ago

Tiny issue : add a reference to the language code system you use.

I may have missed it.

See also : https://en.wikipedia.org/wiki/Language_code

brawer commented 3 years ago

It’s IETF language tags. Same coding system as used by HTML, XML, and almost everything else on the internet.

hugolpz commented 3 years ago

It's consistent ? 100% of your files' names use this convention ?

brawer commented 3 years ago

The README file already says “IETF BCP47”. Yes, they should be consistent; are you aware of any cases that aren’t IETF language tags?

hugolpz commented 3 years ago

Thank. I reported it onto LinguaLibre:Language_codes_systems_used_across_LinguaLibre