guardian / typerighter

Even if you’re the right typer, couldn’t hurt to use Typerighter!
Apache License 2.0
276 stars 12 forks source link

Filter out title case matches from dictionary rule matches #425

Closed rhystmills closed 11 months ago

rhystmills commented 1 year ago

What does this change?

This PR filters out Dictionary rule matches that match against Title Case phrases.

Currently, most false positive spellcheck errors are for proper nouns - usually non-english names and places.

This PR filters out such matches before they are sent to Composer (though the DictionaryMatcher LanguageTool instance still generates such matches).

It also adds some tests which try to cover as many edges cases as possible.

In action in Composer:

image

How to test

Run locally:

rhystmills commented 1 year ago

Ran tests locally, works as expected.

One question – we'd expect corrections at the start of a sentence to be startcased, e.g. ... end of sentence. Strat of next sentence would have correction Start. Would this PR account for sentence starts?

There's a bit of prior art for detecting and managing sentence starts in RegexMatcher if that's relevant.

This did cross my mind. It will not catch typos when they're the first word of a sentence or paragraph, currently.

We'd still have the problem of false positive names in these instances, so it's up to us to decide whether that risk is worth spellchecking the first word. What are your thoughts?

jonathonherbert commented 11 months ago

@rhystmills, do we still need this after #443?

rhystmills commented 11 months ago

@rhystmills, do we still need this after #443?

We do not - now closed.