shabie / docformer

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
MIT License
255 stars 40 forks source link

Inference for token classification. #49

Open Akhilesh64 opened 1 year ago

Akhilesh64 commented 1 year ago

Hi @uakarsh, I trained docformer on a custom dataset and got a checkpoint file. I have no idea how to perform inference on test images. Can you show me how to do inference on images or if possible share a code snippet for the same.

riteshKumarUMass commented 1 year ago

I found this Kaggle notebook to be really helpful. You may refer to the last section of notebook for inferencing.

Did you pre-train the model ? Several of us found the model to be overfitting on training data without prior pre-training. Could you share you observation regarding the same?