How to replicate FUNSD dataset for question answering

One of the possible solutions could be.....

The output of the DocFormer has a shape (batch_size, 512,768) # Extract it from the steps described in the readme
So, now, extract the features from the question (maybe you can see some Visual Question Answering Models, about how they extract the language features), and then combine the results of the DocFormer Encoder + Language Features (either by concatenating or some other methods), and then apply linear layers of the desired sequence length, and you are done.

This was just a high-level overview, as far as I have tried to implement it.

For more answers, you can try to search medical Visual Question Answering with Transformers, and replace each of the normal transformers, with DocFormer, and boom, you are done.

And, I would request, that once you are done, please let us know the results, since it would benefit the community, and we would know that, it is working on QA as well. If you have any more questions, let me know. :)

shabie / docformer

How to replicate FUNSD dataset for question answering #16