tbroadley / spellchecker-cli

A command-line tool for spellchecking files.
MIT License
119 stars 16 forks source link

Add Vietnamese support #63

Closed titieo closed 3 years ago

titieo commented 4 years ago

Excuse me, I've find a bug in the spellchecker-cli. I'm using Mac OS X 10.11.6 and here is my command

spellchecker --plugins spell repeated-words syntax-urls --dictionaries dictionary/dictionary.txt dictionary/science.txt --files 'docs/data/**/*.md'  && echo Spellcheck passed.) || (echo Spellcheck failed! Please review and fix errors/add words to dictionary as needed. && exit 1)

As you see in the picture below, the spellchecker-cli do=id warn me about a word that is in the dictionary.txt, this seem only occurs with other words which have similar spelling with the one another english word I don't know if you can fix this issue image

tbroadley commented 4 years ago

Hi there! Thanks for the issue. Dictionaries are case-sensitive - that's why your lowercase dictionary entry isn't matching the capitalized word in the file being spellchecked. I think this behaviour makes sense - it allows the program to detect mis-capitalizations of words in the dictionary, but also allows users to ignore different capitalizations by including them in the dictionary.

To fix this, I'd suggest adding the capitalized version of the word to the dictionary. Let me know if that helps :slightly_smiling_face:

titieo commented 4 years ago

Hi, thanks for your suggestion, as far as I can check, most of the errors occurred when I run the spellcheck are issues with "capitalized word". I have tried to add that in my personal dictionary and it's fixed. Thanks for your help. 🙂 There is only a problem with repeated word As you see here, some languages have some phrase words. In my case, for example, the phrase word is song song (Vietnamese) which mean parallel (English) but the retext-repeated-words recognize it as 2 separate words instead of a phrase word only which lead to the error shown below, I've tried to add this in the dictionary but didn't help
image

tbroadley commented 4 years ago

I'm glad that was helpful! And thanks for bringing up this other issue. As far as I can tell, retext-repeated-words isn't powerful enough to recognize phrase words from non-English languages. I think it's only intended to be run on English text, unfortunately. I think the best option would be to disable that plugin.

Also I just found that retext-spell can integrate with this Vietnamese dictionary. I'd accept a PR to add support for that dictionary to spellchecker-cli. I think that'd be as simple as adding dictionary-vi as a dependency, adding a couple of tests, and documenting the change.

titieo commented 4 years ago

Also I just found that retext-spell can integrate with this Vietnamese dictionary. I'd accept a PR to add support for that dictionary to spellchecker-cli. I think that'd be as simple as adding dictionary-vi as a dependency, adding a couple of tests, and documenting the change.

Sorry that I don't have much knowledge about coding (information technology) just enough skill to create a vuepress site follow their site so I don't know how to add support for vietnamese dictionary to spellchecker-cli, I've tried to read the index.js here as well as reading about the retext-spell's API section but haven't found out the way add another language. I'm really sorry for this

tbroadley commented 4 years ago

I understand, don't worry about it! :slightly_smiling_face: