shaigue / pmi_masking

This repository contains code that takes a text corpus and creates a PMI masking vocabulary for it.
MIT License
1 stars 0 forks source link

run medium bookcorpus on linux system to verify that everything is working well, and recover logging, and analyze output #25

Open shaigue opened 1 year ago

shaigue commented 1 year ago

-- when running on Azure free teir VM, the program stopped, and I suspect that it was due to large batch sizes and RAM being limited. To avoid that, I have reduces the default batch sizes (for tokenizer and ngram counting).