guardian / typerighter

Even if you’re the right typer, couldn’t hurt to use Typerighter!
Apache License 2.0
276 stars 12 forks source link

Use opennlp for sentence detection #226

Closed jonathonherbert closed 1 year ago

jonathonherbert commented 1 year ago

What does this change?

Swaps CoreNLP for OpenNLP for sentence detection and tokenisation.

How to test

The automated tests should pass. They cover capitalisation of suggestions at sentence starts, with a few variants for odd tokens to account for typos and punctuation.

Run Typerighter locally, or – easier! – in CODE. Run it on a few articles, paying special attention to suggestions at the start of sentences. It should behave as expected.