issues
search
ybracke
/
transnormer
A lexical normalizer for historical spelling variants using a transformer architecture.
GNU General Public License v3.0
6
stars
1
forks
source link
Loading functionality for Anselm corpus
#48
Open
ybracke
opened
1 year ago
ybracke
commented
1 year ago
Source:
https://github.com/coastalcph/histnorm/tree/master/datasets/historical/german
Normalization guidelines
The orig and normalized layer are all-lowercased
The normalized layer contains some errors; is it really manually annotated?
Not split into sentences (see #25)