At the moment, when we apply suggestions from regular expressions, there's sometimes an unexpected side effect: we overwrite casing in the matched text.
For example, with regex (?i)\\bmedia?eval (note the (?i) flag, which means it's case insensitive), we always suggest the word medieval.
This causes problems when words begin sentences, e.g. end of sentence. Medieval will produce a match suggesting medieval.
This PR preserves case where possible, by detecting sentence starts. When a regex applies to a sentence start, and the starting characters of the suggestion and the match whilst ignoring case, we keep the casing of the match.
So given the regex (?i)\bmedia?eval with the replacement medieval where square brackets denote a match:
... [mediavel] castles will produce the suggestion medieval
end of sentence. [Medieaval] castles will produce the suggestion Medieval
end of sentence. [medieaval] castles will produce the suggestion Medieval
The reason we preserve the suggestion on a perfect caseless match, rather than just stripping it, is to preserve the match's 'mark as correct' behaviour.
How to test
The unit tests should pass.
E.g. the sentence End of sentence. Mediaeval should offer a correction to Medieval. Before, it would offer medieval.
How can we measure success?
Fewer complaints about mistakes w/ casing in suggestions.
Have we considered potential risks?
This is probably still not perfect, but the rules to which this apply are probably better addressed in the long run with a dictionary. Wonky edge cases gratefully received.
What does this change?
At the moment, when we apply suggestions from regular expressions, there's sometimes an unexpected side effect: we overwrite casing in the matched text.
For example, with regex
(?i)\\bmedia?eval
(note the(?i)
flag, which means it's case insensitive), we always suggest the wordmedieval
.This causes problems when words begin sentences, e.g.
end of sentence. Medieval
will produce a match suggestingmedieval
.This PR preserves case where possible, by detecting sentence starts. When a regex applies to a sentence start, and the starting characters of the suggestion and the match whilst ignoring case, we keep the casing of the match.
So given the regex
(?i)\bmedia?eval
with the replacementmedieval
where square brackets denote a match:... [mediavel] castles
will produce the suggestionmedieval
end of sentence. [Medieaval] castles
will produce the suggestionMedieval
end of sentence. [medieaval] castles
will produce the suggestionMedieval
The reason we preserve the suggestion on a perfect caseless match, rather than just stripping it, is to preserve the match's 'mark as correct' behaviour.
How to test
The unit tests should pass.
E.g. the sentence
End of sentence. Mediaeval
should offer a correction toMedieval
. Before, it would offermedieval
.How can we measure success?
Fewer complaints about mistakes w/ casing in suggestions.
Have we considered potential risks?
This is probably still not perfect, but the rules to which this apply are probably better addressed in the long run with a dictionary. Wonky edge cases gratefully received.