kupolak / textstat

Ruby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.
MIT License
31 stars 9 forks source link

Language support clarification #41

Open scarroll32 opened 2 years ago

scarroll32 commented 2 years ago

In the README, a number of languages are listed as being supported.

Is this for all functions?

Languages supported:

US English
Catalan
Czech
Danish
Spanish
Estonian
Finnish
French
Hungarian
Indonesian
Icelandic
Italian
Latin
Dutch (Nederlande)
Bokmål (Norwegian)
Polish
Portuguese
Russian
Swedish
kupolak commented 2 years ago

@scarroll32 Hey, good point. Methods like difficult_words or forcast have an optional argument with language change. The default is 'en_us'. I should add language support also for other methods because at the moment most of them support only English. In general, textstat uses the text-hyphen library for syllabifying words and easy words dictionaries (lib/dictionaries/ folder) so there should be no problem with supporting these languages. At this point, there are only tests for English.