shabie / docformer

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
MIT License
255 stars 40 forks source link

DocFormer for Token Classification. #39

Closed Akhilesh64 closed 2 years ago

Akhilesh64 commented 2 years ago

Hi, First of all great work. I wanted to ask if DocFormer can be used for token classification like LayoutLM series models of Microsoft Research which support tasks like Token Classification, Document Image Classification and Visual Question-Answering and if it does how we can adapt the model to the task of token classification.

uakarsh commented 2 years ago

HI there,

This was a question, which has been asked a few times. Currently, I am planning to implement DocFormer for the purpose of Token Classification. The changes won't be much, except for the function create_features, and get_tokens_with_boxes in the dataset.py file.

I would shortly release it as well, and thanks for your kind words!!

Regards, Akarsh

Akhilesh64 commented 2 years ago

Sure thanks a lot.

uakarsh commented 2 years ago

Hi @Akhilesh64, you can find the code of DocFormer for Token Classification here

Regards,