ropensci / cld3

Bindings to Google's Compact Language Detector 3
https://docs.ropensci.org/cld3
41 stars 5 forks source link

Spanish manual language detection problems #2

Open silviaegt opened 3 years ago

silviaegt commented 3 years ago

Hi @jeroen! Thank you so much for this development, it runs so smooth and it's so useful! I have been doing some manual tagging for Spanish tags and have found some things that might be interesting but I am unsure if this would be useful for this wrapper and was wondering if you could point me towards the right direction. For instance, from a list of conference titles, those in "Spanglish" got tagged as English w/cld2 and as Spanish with cld3, + while cld3 got real better at distinguishing Galician from Spanish there are more than a few times where it gets these tags wrong. Can you think about someone who could benefit from my manually tagged dataset?

jeroen commented 3 years ago

You could ask the authors of the cld3 library: https://github.com/google/cld3 And of course you always publish interesting datasets as an R package, just like e.g. nycflights13