ybracke / transnormer

A lexical normalizer for historical spelling variants using a transformer architecture.
GNU General Public License v3.0
6 stars 1 forks source link

Remove or update `split_dataset` #86

Open ybracke opened 7 months ago

ybracke commented 7 months ago

split_dataset does not do what it claims to do, namely a genre-stratified split. I only have the code for that locally (~/code/dta2jsonl/genre_stratified_splits.py). Simply remove split_dataset.py and the section in the README.