Closed mayankpathaklumiq closed 2 years ago
One of the possible solutions could be.....
This was just a high-level overview, as far as I have tried to implement it.
For more answers, you can try to search medical Visual Question Answering with Transformers, and replace each of the normal transformers, with DocFormer, and boom, you are done.
And, I would request, that once you are done, please let us know the results, since it would benefit the community, and we would know that, it is working on QA as well. If you have any more questions, let me know. :)
I have tried to implement funsd dataset question answering but I am consfued how to use docformer multi modal features output.