mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.75k stars 428 forks source link

Adding ViTSTR #513

Closed felixdittrich92 closed 2 years ago

felixdittrich92 commented 3 years ago

Adding Vision Transformer for scene text recognition i work currently on this (with huggingface ViT backbone) if i done and have solid results it would be a charme for me to add this model if you interested !? :) Same for the new unilm/TrOCR model

charlesmindee commented 3 years ago

Hi @felixdittrich92,

Thanks for your message, it would be a pleasure having you contributing to the lib!

We already have a recognition model including a transformer decoder (MASTER), but we do not have yet full transformer architectures such as ViT or TrOCR. It is on the mid-term road map, and if you would like to propose your implementation you are more than welcome to open a PR! :pray:

Please read the CONTRIBUTING section and feel free to look at the models already implemented in doctr :smiley:

Thank you and have a nice day :+1:

felixdittrich92 commented 3 years ago

i will do thanks :) :+1:

charlesmindee commented 2 years ago

Hi @felixdittrich92, do you still plan to implement this ? If not, we may close this issue to avoid a huge stack of unaddressed ones!

felixdittrich92 commented 2 years ago

Huhu @charlesmindee :wave: , yes of course (maybe a bit lighter version with mobilevit) but i think ftm there are other thinks like a fix for master and sar are more important so i would say lets hold this on 1.0.0 wdyt ? :+1:

charlesmindee commented 2 years ago

ok

chpatrick commented 1 year ago

@felixdittrich92 Hi, are there any model weights available for ViTSTR that are compatible with doctr? :)

I saw these ones but they seem to be named differently I suppose: https://github.com/roatienza/deep-text-recognition-benchmark/releases