shabie / docformer

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
MIT License
253 stars 40 forks source link

DocFormer for key-value pairs extraction #45

Open hjerbii opened 1 year ago

hjerbii commented 1 year ago

Hello,

Is it possible to train DocFormer on key-value (or Question/Answer) extraction task? If so, could you please explain the approach?

Thanks!

uakarsh commented 1 year ago

Since DocFormer is an encoder, I think you can surely use it for your question answering task. You need to add a code for encoding the question and attach a prediction head for the same.

I think this link can help you with it, since conceptually there are a lot of similarity between them https://github.com/uakarsh/latr

Hope this helps

Regards, Akarsh