uakarsh / latr

Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answering (STVQA)
https://uakarsh.github.io/latr/
MIT License
52 stars 7 forks source link