shabie / docformer

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
MIT License
255 stars 40 forks source link

Some changes in the script #33

Closed uakarsh closed 2 years ago

uakarsh commented 2 years ago

What does this pull request do?

  1. Removes the device issue (as mentioned in one of the issue)
  2. Provides an option to add already extracting bounding boxes for feature extraction
  3. Contains examples for using DocFormer with PyTorch Lighting for Data Loading, Pre-training and a sample training on RVL-CDIP Dataset