Closed agitter closed 4 years ago
Changing this pattern to ([%a’]+)\n
solved my initial problem. However, that pattern still fails to match and return misspelled words that contain accents. "Naaïve" is written as "ve", which makes it difficult to find the spelling error.
The pattern ([%S]+)\n
meets my needs. I could make a pull request to change this, but I'm not sure what the intended behavior is in general for this filter.
I think one issue with [%S]+
is that it will give bad results in cases where words abut punctuation, e.g.
dogs, cats, and ferrets.
We don't want the commas and periods included here. (Similarly parentheses, dashes, quotation marks.)
But I'm definitely open to other suggestions.
Regarding [%a']+
note that we don't want the single quotes to be included when they're functioning as quotation marks, rather than apostrophes.
The version of aspell I'm testing with (version 0.60.7) appears to strip leading and trailing punctuation.
I created a file demo.txt
with spelling errors:
dogsxyz, catsxyz, and ferretsxyz.
'bearsxyz' and "wolvesxyz"?
The aspell output does not include any punctuation:
> cat content/demo.txt | aspell list
dogsxyz
catsxyz
ferretsxyz
bearsxyz
wolvesxyz
I'm not familiar enough with aspell to know how universal this behavior is across versions, modes, and languages.
ah, okay! Why don't you go ahead and submit a PR?
When aspell is run on a raw Markdown file that contains a possessive such as
pandoc's
, the entire stringpandoc's
is returned as a misspelled word. When the pandoc spellcheck filter is run on that Markdown file, only the suffixs
is returned. The behavior is similar for other words with apostrophes.I noticed the spellcheck filter uses the following pattern to capture text from the aspell output https://github.com/pandoc/lua-filters/blob/3c870cb5799fb1c4cb961b6648e5b3cddc50cfde/spellcheck/spellcheck.lua#L33
I'm not confident that is causing this behavior, but the behavior is unexpected. It causes the spellcheck filter to always return the string suffixes that follow an apostrophe, even if those suffixes are in the aspell custom dictionary.