issues
search
ybracke
/
transnormer
A lexical normalizer for historical spelling variants using a transformer architecture.
GNU General Public License v3.0
6
stars
1
forks
source link
Data: Add CAB-normalized versions of DTA as training data
#35
Closed
ybracke
closed
1 year ago
ybracke
commented
1 year ago
[x] External: Create a variant of the DTA-Kernkorpus (
dtak
) JSON Lines files with an additional field for the CAB-normalized text.
See
dt2jsonl
In addition to original texts, use the normalized versions from
here
[x] Add loading facilities to
transnormer
(and apply conversion to interim format)
ybracke
commented
1 year ago
Closed with PR #39
dtak
) JSON Lines files with an additional field for the CAB-normalized text.transnormer
(and apply conversion to interim format)