Closed molbap closed 1 year ago
This is still in progress, I'm running tests on
I think this one is ready, it should close the other one as well, no conflicts afaik. This adds
@rwightman if you find weird things here, lmk, I think we're good to merge in current state. Notes:
This PR adds a finetuning task for pixparse. It focuses on the simple document classification task on RVLCDIP.
New args are added to app/train.py.
Then, a finetuning classification task on rvlcdip can be launched as
The loader.py is modified to allow for non-webdataset non-s3-stored datasets, namely, hf datasets from the datasets library. This uses chug/LoaderBundle https://github.com/huggingface/chug/blob/cfb16882e1058b37871b61fe8f76830cef3d8750/src/chug/common/types.py#L19. Eventually this should be moved under chug https://github.com/huggingface/chug/issues/2.