ropensci / spelling

Tools for Spell Checking in R
https://docs.ropensci.org/spelling
Other
107 stars 25 forks source link

Ignore fancy quotes #43

Closed gaborcsardi closed 5 years ago

gaborcsardi commented 5 years ago

I have this in an Rmd:

See ['Configuration'][pkg_config] for details.

which will have fancy quotes in the README.md, courtesy of pandoc I guess:

See [‘Configuration’](TODO) for details.

and then spellcheck reports:

Configuration’ README.md:177

I wonder if it would be easy to ignore the fancy quotes? TBH I am not sure why they are considered to be part of the word.

jeroen commented 5 years ago

Unfortunately, the apostrophe (fancy or not) is explicitly not ignored in English hunspell dictionaries because it is needed for words like it's or let's:

hunspell::hunspell("It’s a beautiful day")

It is actually the only custom wordchar allowed in English words (everything else gets ignored):

> hunspell::dictionary('en_US')
<hunspell dictionary>
 affix: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/hunspell/dict/en_US.aff 
 dictionary: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/hunspell/dict/en_US.dic 
 encoding: UTF-8 
 wordchars: ’ 
 added: 0 custom words

So there is no easy way to do this in hunspell but perhaps we manuall strip them in spelling when parsing the AST as we do for the heading identifiers:

https://github.com/ropensci/spelling/blob/569a24f910d78b9adef6e540e4e0b91849842f94/R/parse-markdown.R#L26-L34

gaborcsardi commented 5 years ago

OK, that makes sense. Apparently Unicode prefers to use \u2019 for apostrophe, and pandoc uses it for quoted words, so this might be a pandoc error. Although I am not sure what pandoc could use instead. Maybe I should just ignore README.md for spellchecking, it is a generated file, anyway.