pfefferniels / probstuecke-digital

A digital edition of the 24 Probstücke of the Oberclasse by Johann Mattheson.
http://probstuecke-digital.de
GNU General Public License v3.0
10 stars 1 forks source link

Deal with wrong normalizations #124

Open pfefferniels opened 2 years ago

pfefferniels commented 2 years ago

While generally text normalization using DTA's CAB service works fine, it introduces some mistakes, e.g. Schleiffer gets normalized to "Schluffer" and some more. Probably we should collect those and correct them after normalization again.

pfefferniels commented 2 years ago

further mistakes

pfefferniels commented 2 years ago

Not sure if relevant, but CAB seems to make use of an "exception lexicon"

rettinghaus commented 2 years ago

Could we report those errors for improving the language model?

pfefferniels commented 2 years ago

Yes absolutely – I'd suggest to collect as many as possible and report them in one go. Not sure where exactly to report though … (Bryan Jurish perhaps?)