retextjs / retext-spell

plugin to check spelling
https://unifiedjs.com
MIT License
73 stars 16 forks source link

Numbers and words with punctuation should be excluded #5

Closed localjo closed 8 years ago

localjo commented 8 years ago

I think numbers and words with punctuation should be excluded by this plugin. It doesn't make much sense to see results like this;

  55:6-55:32     warning  some-filename.json is misspelled  spelling
  59:115-59:124  warning  250 is misspelled                   spelling

Although it might make sense to include things like e.g. or well-known, so I'm not 100% sure about this. What are your thoughts @wooorm? Do you excluding words with punctuation from the plugin would be the right solution, or do you think there's a better solution? You've got a lot more experience than I do writing code that deals with natural human language. 🤓

wooorm commented 8 years ago

So I just checked with some dictionaries and seems they include quite some words with dashes, dots, and digits.

This means that these phrases can be passed directly in a spell-checker and get spell corrections.

The first case, some-filename.json, is something which I believe should be highlighted, and either wrapping it in ticks, or in a link and having a different label, fixes this. Also: my spell checker on macOS also highlights it.

The second case, \d+, should definitely be OK. Not sure if the spell checker should handle it though, or something before it.

localjo commented 8 years ago

It seems intuitive to me that the spell checker should exclude \d+, and I don't see how it could hurt since there is no valid/invalid "spelling" of digits, and it's probably faster to exclude those numbers than to run the check function on them. I don't know how other spell checkers handle this, but excluding it seems like the right call. I'll open a PR that excludes digits.

Regarding the other cases, I think you're right that they shouldn't be excluded. They're frustrating in my use case, but I can exclude them separately in my project since it makes sense to keep them in the spell checker.

wooorm commented 8 years ago

Oh this was done already: GH-6.