We do some string manipulation when parsing to ensure that contractions pass through correctly. It's important that "don't" is parsed as "don't" instead of "dont" or "don" "t" to match the entry in the stoplist (and to handle e.g. proper names containing apostrophes).
Documents sometimes use a curly single quote (that's ’, which is ’ in HTML) in contractions. This adds some logic to ensure that that's parsed the same as a single quote would be.
We do some string manipulation when parsing to ensure that contractions pass through correctly. It's important that
"don't"
is parsed as"don't"
instead of"dont"
or"don" "t"
to match the entry in the stoplist (and to handle e.g. proper names containing apostrophes).Documents sometimes use a curly single quote (that's
’
, which is’
in HTML) in contractions. This adds some logic to ensure that that's parsed the same as a single quote would be.