Closed organisciak closed 4 years ago
There would be value in an optional cleaning flag that applies opinionated but uncontroversial OCR corrections to the tokens. e.g.
^\W\w+
\w+\W$
"token
token—
ſ
fl ff ffi ffl, Ꝏ ꝏ
There would be value in an optional cleaning flag that applies opinionated but uncontroversial OCR corrections to the tokens. e.g.
^\W\w+
or\w+\W$
-> Words that have punctuation attached to the first or last index, like"token
ortoken—
ſ
) to a regular sfl ff ffi ffl, Ꝏ ꝏ
)