mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.8k stars 435 forks source link

Request For Adding ParSeq - text recognition model #1003

Closed nikokks closed 1 year ago

nikokks commented 2 years ago

πŸš€ The feature

Hello,

I mainly use the text detection and text recognition models with your framework.

As I have seen: the most recent models that you propose in text recognition, namely MASTER and SAR, are not yet operational.

However at the text recognition level, there is a recent model that gets very impressive performances: PasrSeq.

Here are the references: https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/master/README.md https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/README.md https://paperswithcode.com/paper/scene-text-recognition-with-permuted#code https://github.com/baudm/parseq

Would it be possible to add this recognition text to the models you propose?

Thanks a lot for your work !

Motivation, pitch

I'm working with text recognition models and a recent model in state of the art outperforms on all test datasets. I would like use this model with your framework pipelines

Alternatives

No response

Additional context

No response

felixdittrich92 commented 2 years ago

Hi @nikokks πŸ‘‹ , fyi MASTER and SAR are fixed in both frameworks on main branch will be released soon with 0.5.2

I agree with the ParSeq addition firstly i have had in mind to add ViTSTR but yes i think we can switch directly to ParSeq instead of thisπŸ‘

Are you maybe interested to open a PR for this model we would be happy to help with ! Otherwise we could take it in the near future we have definitly on track to add more models and i totally agree that ParSeq has to be one of it :)

Do you have experience / tested the models latency on cpu ? Would be interesting to see

frgfm commented 2 years ago

I agree that this could be a good candidate for new text recognition models in 0.6.0 :) Perhaps we should open a tracker issue for all new models request that we would stage for 0.6.0, or directly link them in the release tracker?

felixdittrich92 commented 2 years ago

@frgfm I would say a seperate issue where we can track all requested model additions (splitted in detection / recognition / TF / PT with paper/repo link) and link this issue in the release tracker. wdyt ? Do you like to open it ? :)

frgfm commented 2 years ago

@felixdittrich92 Done :)

felixdittrich92 commented 2 years ago

@nikokks @frgfm Do you agree if we close this ticket ? Should be fine if we track it in mentioned issue :)

frgfm commented 2 years ago

I think we should keep this:

So no need to close it, and that will notify @nikokks when this gets resolved, which I guess is of interest to him :)

felixdittrich92 commented 2 years ago

πŸ‘

nikokks commented 2 years ago

I am inspecting the baudm code on Parseq (https://github.com/baudm/parseq). I think the code might not be too complicated to integrate.

On my side I managed to connect your ocr_prediction and to integrate the reco_predictor of baudm successfully. The performances are not to good for french documents like yours actually => needs to be finetuned on your secret data πŸ˜‰

Several questions will come from me on the choices of implementation and integration in doctr:

You can close this issue

felixdittrich92 commented 2 years ago

Hi @nikokks πŸ‘‹ ,

lets keep it open for further conversation about ParSeq πŸ‘

About your points:

We are definitly happy to help with. I would say if you are ready open a PR (starting with the classification ViT integration) and we iterate on this wdyt ?

nikokks commented 2 years ago

Hi,

ok for timm.

Other question: can we integrate 'pytorch-lightning~=1.6.5' to the requirements-pt.txt ?

felixdittrich92 commented 2 years ago

Hi @nikokks πŸ‘‹ , Why do you think we need this ? :) We should implement all models in plain pytorch / tensorflow without any wrappers

nikokks commented 2 years ago

ok, it sounds good for me :)

I have added parseq class on my fork. Now I need to match all args of each methods between your wrapper and the model class parseq =) : (the most difficult part) I'll do it in the next days or next weekend.

felixdittrich92 commented 2 years ago

@nikokks I would suggest the following steps (every should be one PR)

frgfm commented 2 years ago
  • move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)

I agree :+1:

  • implement ViT as classification model

Yup, but giving credits to the rightful contributors / source of inspiration when relevant!

felixdittrich92 commented 2 years ago

@nikokks Now you can reuse the already implemented transformer parts for ViT :+1:

felixdittrich92 commented 2 years ago

Hi @nikokks short update i have not forget it will (hopefully) start with ViTSTR next week then it should be easy to implement the decoder from parseq also πŸ‘

felixdittrich92 commented 2 years ago

Hi @nikokks, are you still interested to add ParSeq ? :) After #1063 ViT has finally found it's way in doctr as backbone. And #1055 will be updated soon (which could be used as template also for ParSeq)

nikokks commented 1 year ago

Hello, I am currently implementing ParSeq. It is now working with quite good predictions :) Do you have any good advice to do a good pull request ? best

felixdittrich92 commented 1 year ago

Hey @nikokks πŸ‘‹ ,

You can take a look into https://github.com/mindee/doctr/pull/1055/files which is i think a good template (implementation, tests, docs) for implementing parseq in doctr :) Otherweise open a PR and we will guide you step by step πŸ‘

PS: If you have only the PT implementation that's fine we can port it later to TF :)

felixdittrich92 commented 1 year ago

Finished after #1227 and #1228 are merged :)