qurator-spk / dinglehopper

An OCR evaluation tool
Apache License 2.0
59 stars 13 forks source link

Feature request: fold to unaccented & lowercase #31

Open jbarth-ubhd opened 3 years ago

jbarth-ubhd commented 3 years ago

For search engine performance evaluation it would be nice to be able to compare text based on its base characters (e. g. Ä → a).

An option to ignore punctation would be nice, too.

mikegerber commented 3 years ago

This is basically #11.