Open rwightman opened 1 year ago
We can test that. pix2struct did this pre-pretraining stage on screenshots from BookCorpus and have a "warmup" dataset that we can reuse for many tasks, it's not too complicated to make.
Creating curriculum learning warmup dataset, see code in https://github.com/huggingface/pixparse-data/pull/14
With issues like #9, can training be improved by starting with a lower resolution, possibly also simpler documents (ie generated or large font docs as in Pix2struct pretrain) before moving to training at higher resolution w/ more complex documents?