Request For Adding ParSeq - text recognition model

nikokks commented 2 years ago

🚀 The feature

Hello,

I mainly use the text detection and text recognition models with your framework.

As I have seen: the most recent models that you propose in text recognition, namely MASTER and SAR, are not yet operational.

However at the text recognition level, there is a recent model that gets very impressive performances: PasrSeq.

Here are the references: https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/master/README.md https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/README.md https://paperswithcode.com/paper/scene-text-recognition-with-permuted#code https://github.com/baudm/parseq

Would it be possible to add this recognition text to the models you propose?

Thanks a lot for your work !

Motivation, pitch

I'm working with text recognition models and a recent model in state of the art outperforms on all test datasets. I would like use this model with your framework pipelines

Alternatives

No response

Additional context

No response

felixdittrich92 commented 2 years ago

Hi @nikokks 👋 , fyi MASTER and SAR are fixed in both frameworks on main branch will be released soon with 0.5.2

I agree with the ParSeq addition firstly i have had in mind to add ViTSTR but yes i think we can switch directly to ParSeq instead of this👍

Are you maybe interested to open a PR for this model we would be happy to help with ! Otherwise we could take it in the near future we have definitly on track to add more models and i totally agree that ParSeq has to be one of it :)

Do you have experience / tested the models latency on cpu ? Would be interesting to see

frgfm commented 2 years ago

I agree that this could be a good candidate for new text recognition models in 0.6.0 :) Perhaps we should open a tracker issue for all new models request that we would stage for 0.6.0, or directly link them in the release tracker?

felixdittrich92 commented 2 years ago

@frgfm I would say a seperate issue where we can track all requested model additions (splitted in detection / recognition / TF / PT with paper/repo link) and link this issue in the release tracker. wdyt ? Do you like to open it ? :)

frgfm commented 2 years ago

@felixdittrich92 Done :)

felixdittrich92 commented 2 years ago

@nikokks @frgfm Do you agree if we close this ticket ? Should be fine if we track it in mentioned issue :)

frgfm commented 2 years ago

I think we should keep this:

the PR will close this issue, and the release tracker will get the checkbox ticked

So no need to close it, and that will notify @nikokks when this gets resolved, which I guess is of interest to him :)

felixdittrich92 commented 2 years ago

👍

nikokks commented 2 years ago

I am inspecting the baudm code on Parseq (https://github.com/baudm/parseq). I think the code might not be too complicated to integrate.

On my side I managed to connect your ocr_prediction and to integrate the reco_predictor of baudm successfully. The performances are not to good for french documents like yours actually => needs to be finetuned on your secret data 😉

Several questions will come from me on the choices of implementation and integration in doctr:

can we integrate libraries like timm
do I have to do it in TF too (torch only if possible)
if I can add additional utils.py in the future ParSeq folder if this is not your habits.
etc.

I will clarify my questions rather next weekend 😀

You can close this issue

felixdittrich92 commented 2 years ago

Hi @nikokks 👋 ,

lets keep it open for further conversation about ParSeq 👍

About your points:

timm is a great library but it is mostly pinned in librarys which use it to a fixed version specifier in fact of maintaince and it would be another extra dependency in doctr. So i would suggest to implement ViT from scratch inside the classification models to use it as feature extractor (@frgfm we should move transformers components from models/recognition to models that we can reuse the modules wdyt?)
it would be great to implement it in both frameworks but it's also fine if you only want to focus on PT ... anyone else could do the translation 👍
if it is stuff which is only ParSeq related and does not depends on PT or TF i would say base.py is the right place (but lets take a look if we see it in a PR 😅

We are definitly happy to help with. I would say if you are ready open a PR (starting with the classification ViT integration) and we iterate on this wdyt ?

nikokks commented 2 years ago

Hi,

ok for timm.

Other question: can we integrate 'pytorch-lightning~=1.6.5' to the requirements-pt.txt ?

felixdittrich92 commented 2 years ago

Hi @nikokks 👋 , Why do you think we need this ? :) We should implement all models in plain pytorch / tensorflow without any wrappers

nikokks commented 2 years ago

ok, it sounds good for me :)

I have added parseq class on my fork. Now I need to match all args of each methods between your wrapper and the model class parseq =) : (the most difficult part) I'll do it in the next days or next weekend.

felixdittrich92 commented 2 years ago

@nikokks I would suggest the following steps (every should be one PR)

move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)
implement ViT as classification model
implement the ParSeq Decoder side which use ViT as feature extractor

frgfm commented 2 years ago

move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)

I agree :+1:

implement ViT as classification model

Yup, but giving credits to the rightful contributors / source of inspiration when relevant!

felixdittrich92 commented 2 years ago

@nikokks Now you can reuse the already implemented transformer parts for ViT :+1:

felixdittrich92 commented 2 years ago

Hi @nikokks short update i have not forget it will (hopefully) start with ViTSTR next week then it should be easy to implement the decoder from parseq also 👍

felixdittrich92 commented 2 years ago

Hi @nikokks, are you still interested to add ParSeq ? :) After #1063 ViT has finally found it's way in doctr as backbone. And #1055 will be updated soon (which could be used as template also for ParSeq)

nikokks commented 1 year ago

Hello, I am currently implementing ParSeq. It is now working with quite good predictions :) Do you have any good advice to do a good pull request ? best

felixdittrich92 commented 1 year ago

Hey @nikokks 👋 ,

You can take a look into https://github.com/mindee/doctr/pull/1055/files which is i think a good template (implementation, tests, docs) for implementing parseq in doctr :) Otherweise open a PR and we will guide you step by step 👍

PS: If you have only the PT implementation that's fine we can port it later to TF :)

felixdittrich92 commented 1 year ago

Finished after #1227 and #1228 are merged :)

mindee / doctr