tbroadley / spellchecker-cli

A command-line tool for spellchecking files.
MIT License
119 stars 16 forks source link

Add option to only spellcheck parts of hyphenated words, not the entire word #16

Open tbroadley opened 6 years ago

tbroadley commented 6 years ago

For example, check space, infix, and ops when presented with space-infix-ops, but don't try to spell-check space-infix-ops.


Originally, this issue was about spell-checking the parts of the hyphenated word in addition to the hyphenated words. Updated based on the comments below.

bjankord commented 5 years ago

This would be really helpful. We are planning to generate some markdown docs with variable names that will use hyphens between words. Having this feature would be super helpful so we could automatically spellcheck these variable names.

tbroadley commented 5 years ago

@bjankord thanks for the feedback!

I've been thinking about this a bit more. Currently, spellchecker-cli will flag hyphenated words that contain at least one part with a spelling mistake (like spellig in spellig-mistake) and are not included in a custom dictionary. For your use case, would it be enough to add these variable names to a dictionary? Then, any incorrectly typed variable name would be caught by the tool (assuming at least one part contained a spelling mistake) and correctly typed names would be ignored.

Or perhaps it would be more helpful to have an option for the tool to only spellcheck the parts of hyphenated words. For example, it would check spellig and mistake, but not the entire word spellig-mistake. What do you think?

bjankord commented 5 years ago

For my use case, it would be nice to check just spellig and mistake, but not the entire word spellig-mistake.

tbroadley commented 5 years ago

OK thanks! That makes sense to me as a feature.

ryanblock commented 2 years ago

Any word on what became of this feature? We have a large and growing dictionary of valid hyphenations that must be added as custom dictionary words (e.g. pre-provisioning). Thank you!

tbroadley commented 2 years ago

@ryanblock Thanks for the feedback! I'm not actively working on spellchecker-cli, so I don't have plans to add this feature. I'd definitely review and merge a PR that added it.

ryanblock commented 2 years ago

Ok, good to know! Did you have any early research / notes on implementation here to crib if I were to take a swing? Thank you!

tbroadley commented 2 years ago

I haven't looked into how we could implement this. I don't think retext-spell has this capability built in. One option is to look for hyphenated words and replace the hyphens with spaces before passing the text to spellcheck into Retext (e.g. "It pre-provisions a server" would become "It pre provisions a server").

a2937 commented 2 years ago

I haven't looked into how we could implement this. I don't think retext-spell has this capability built in. One option is to look for hyphenated words and replace the hyphens with spaces before passing the text to spellcheck into Retext (e.g. "It pre-provisions a server" would become "It pre provisions a server").

But wouldn't that turn words like "x-ray" into "x ray" and make it very difficult to spellcheck in some instances? This comes from experience on a personal project of mine.

tbroadley commented 2 years ago

That's a good point. I think that's the reason why I wouldn't make this the default behaviour for spellchecker-cli. People would have to pass a flag like --replace-hyphens-with-spaces to opt into the behaviour.

a2937 commented 2 years ago

I just thought of something kinda related. Should we treat equal signs as spaces as well? While it doesn't show up in normal communication; it does show up in technical documentation on occasion to illustrate a point.

tbroadley commented 2 years ago

Good question. I imagine that Retext would already treat a and b as two separate words in the text a=b, so maybe we don't need to consider that case explicitly. But we can double-check.

ryanblock commented 2 years ago

But wouldn't that turn words like "x-ray" into "x ray" and make it very difficult to spellcheck in some instances? This comes from experience on a personal project of mine.

This is an example of a hyphenated compound word, and those words should probably be known and ignored by this feature. x-ray is a great example – that is the correct spelling of that word, no need to break it up by hyphen.