huu4ontocord / rio

Text pre-processing for NLP datasets
Apache License 2.0
11 stars 6 forks source link

Refactor code from process.py #12

Open huu4ontocord opened 2 years ago

huu4ontocord commented 2 years ago

Move some of the loading code and processing code (anonymization, etc.) to its own files. @justinphan3110 began this and we can complete it. @edugp i created some automatic kenlm model loading code from your HF repo. I think it would be good to have this directly in the kenlm model file.