Closed felixdittrich92 closed 2 years ago
Hi @felixdittrich92,
Thanks for your message, it would be a pleasure having you contributing to the lib!
We already have a recognition model including a transformer decoder (MASTER), but we do not have yet full transformer architectures such as ViT or TrOCR. It is on the mid-term road map, and if you would like to propose your implementation you are more than welcome to open a PR! :pray:
Please read the CONTRIBUTING section and feel free to look at the models already implemented in doctr :smiley:
Thank you and have a nice day :+1:
i will do thanks :) :+1:
Hi @felixdittrich92, do you still plan to implement this ? If not, we may close this issue to avoid a huge stack of unaddressed ones!
Huhu @charlesmindee :wave: , yes of course (maybe a bit lighter version with mobilevit) but i think ftm there are other thinks like a fix for master and sar are more important so i would say lets hold this on 1.0.0 wdyt ? :+1:
ok
@felixdittrich92 Hi, are there any model weights available for ViTSTR that are compatible with doctr? :)
I saw these ones but they seem to be named differently I suppose: https://github.com/roatienza/deep-text-recognition-benchmark/releases
Adding Vision Transformer for scene text recognition i work currently on this (with huggingface ViT backbone) if i done and have solid results it would be a charme for me to add this model if you interested !? :) Same for the new unilm/TrOCR model