tbroadley / spellchecker-cli

A command-line tool for spellchecking files.
MIT License
119 stars 16 forks source link

Generated dictionary sometimes includes word twice, once followed by a period #26

Open tbroadley opened 6 years ago

tbroadley commented 6 years ago

For example, here.

tbroadley commented 6 years ago

Minimal Markdown example that causes this bug:

misspellig.

* lowercase

The following does not trigger the bug:

misspellig.

* Uppercase

It seems to be a bug in either remark-retext or retext itself. In the first case, the period is included as a child to the WordNode object that contains "misspellig". In the second case, the period is a child of the SentenceNode that contains the WordNode.

a2937 commented 2 years ago

Hmm on the personal tiny spellchecker I made; I just flat out removed most punctuation from words that needed spellchecked. Could we do something like this here?

a2937 commented 2 years ago

While removing periods, exclamation marks, and question mark nodes from words that need spellchecked ; it could probably be helpful here?

tbroadley commented 2 years ago

True, that would solve this problem. I'm hesitant to have spellchecker-cli preprocess text too heavily before passing it into Retext. For example, if we removed punctuation, that might conflict with one or more Retext plugins. I have an open issue for adding support for more Retext plugins to spellchecker-cli: #18

I'd rather resolve this issue by fixing the underlying issue in Retext or remark-retext.