huggingface / pixparse

Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data
11 stars 3 forks source link

[Explore] Curriculum training / progressive resolution #10

Open rwightman opened 1 year ago

rwightman commented 1 year ago

With issues like #9, can training be improved by starting with a lower resolution, possibly also simpler documents (ie generated or large font docs as in Pix2struct pretrain) before moving to training at higher resolution w/ more complex documents?

molbap commented 1 year ago

We can test that. pix2struct did this pre-pretraining stage on screenshots from BookCorpus and have a "warmup" dataset that we can reuse for many tasks, it's not too complicated to make.

molbap commented 1 year ago

Creating curriculum learning warmup dataset, see code in https://github.com/huggingface/pixparse-data/pull/14