grammatek / text-cleaner

Python package for text cleaning.
Apache License 2.0
0 stars 0 forks source link

Cleaner handles soft hyphens in an unexpected way. #13

Open G-Thor opened 2 years ago

G-Thor commented 2 years ago

The text cleaner seems to replace soft hyphens with spaces, rather than the more desirable outcome of simply removing them from the text.

An example of this can be obtained by copying text directly from e.g. mbl.is, which includes soft hyphens in its news texts, into the cleaner.

bnika commented 2 years ago

Thanks, just pushed a fix