Open lumpidu opened 3 years ago
Interesting question, and this may well be a use case that we should support better. As is, the code is mostly oriented towards review of continuous text, typically whole sentences.
The code that checks the spelling of a single token is basically around this line. The call to spelling.Corrector.correct()
can optionally be provided with a context, i.e. preceding tokens that will then be used to adjust the correction probabilities based on a trigram language model.
See also the short test function at the bottom of spelling.py
.
At least the documentation of tokenize()
doesn't state assumptions about the text structure in contrast to the documentation of the methods check()
or check_single()
. Yes this use case exists e.g. for spell checking of web input forms, where often only single words or short text terms are entered.
I want to use Greynir-Correct for correction of non-whole sentences, i.e. in extreme cases single words. What method or options should I use to make that possible ?
Currently, when using the
tokenize()
method with optiononly_ci=True
, it complains about the following:Sample code: