Open ybracke opened 1 year ago
Norma might be the easiest approach, see ~/code/norma/HOWTO.md
What we would need:
Data to train Norma
norm\torig
) historic lexicon as "target" lexicon (create from dta, does something exist already?)
modern text to convert into historic variant
serialization script to convert single token text plus byte offsets back to sentences
Possible strategies:
Material: