Closed zuphilip closed 7 years ago
Interesting solution, looks good. Do we want to restrict words to letters and numbers (excluding other printable characters)?
Do we want to restrict words to letters and numbers (excluding other printable characters)?
Well, I want to exclude interpunctations and anything which does not look like a "word". Yes, to split on \W+
the non-word characters does make sense for me. Do you encounter any problems with the current solution?
This is tested with the ersch (fraktur) example from ocropy.