Closed nikokks closed 1 year ago
Hi @nikokks π , fyi MASTER and SAR are fixed in both frameworks on main branch will be released soon with 0.5.2
I agree with the ParSeq addition firstly i have had in mind to add ViTSTR but yes i think we can switch directly to ParSeq instead of thisπ
Are you maybe interested to open a PR for this model we would be happy to help with ! Otherwise we could take it in the near future we have definitly on track to add more models and i totally agree that ParSeq has to be one of it :)
Do you have experience / tested the models latency on cpu ? Would be interesting to see
I agree that this could be a good candidate for new text recognition models in 0.6.0 :) Perhaps we should open a tracker issue for all new models request that we would stage for 0.6.0, or directly link them in the release tracker?
@frgfm I would say a seperate issue where we can track all requested model additions (splitted in detection / recognition / TF / PT with paper/repo link) and link this issue in the release tracker. wdyt ? Do you like to open it ? :)
@felixdittrich92 Done :)
@nikokks @frgfm Do you agree if we close this ticket ? Should be fine if we track it in mentioned issue :)
I think we should keep this:
So no need to close it, and that will notify @nikokks when this gets resolved, which I guess is of interest to him :)
π
I am inspecting the baudm code on Parseq (https://github.com/baudm/parseq). I think the code might not be too complicated to integrate.
On my side I managed to connect your ocr_prediction and to integrate the reco_predictor of baudm successfully. The performances are not to good for french documents like yours actually => needs to be finetuned on your secret data π
Several questions will come from me on the choices of implementation and integration in doctr:
etc.
I will clarify my questions rather next weekend π
You can close this issue
Hi @nikokks π ,
lets keep it open for further conversation about ParSeq π
About your points:
We are definitly happy to help with. I would say if you are ready open a PR (starting with the classification ViT integration) and we iterate on this wdyt ?
Hi,
ok for timm.
Other question: can we integrate 'pytorch-lightning~=1.6.5' to the requirements-pt.txt ?
Hi @nikokks π , Why do you think we need this ? :) We should implement all models in plain pytorch / tensorflow without any wrappers
ok, it sounds good for me :)
I have added parseq class on my fork. Now I need to match all args of each methods between your wrapper and the model class parseq =) : (the most difficult part) I'll do it in the next days or next weekend.
@nikokks I would suggest the following steps (every should be one PR)
- move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)
I agree :+1:
- implement ViT as classification model
Yup, but giving credits to the rightful contributors / source of inspiration when relevant!
@nikokks Now you can reuse the already implemented transformer parts for ViT :+1:
Hi @nikokks short update i have not forget it will (hopefully) start with ViTSTR next week then it should be easy to implement the decoder from parseq also π
Hi @nikokks, are you still interested to add ParSeq ? :) After #1063 ViT has finally found it's way in doctr as backbone. And #1055 will be updated soon (which could be used as template also for ParSeq)
Hello, I am currently implementing ParSeq. It is now working with quite good predictions :) Do you have any good advice to do a good pull request ? best
Hey @nikokks π ,
You can take a look into https://github.com/mindee/doctr/pull/1055/files which is i think a good template (implementation, tests, docs) for implementing parseq in doctr :) Otherweise open a PR and we will guide you step by step π
PS: If you have only the PT implementation that's fine we can port it later to TF :)
Finished after #1227 and #1228 are merged :)
π The feature
Hello,
I mainly use the text detection and text recognition models with your framework.
As I have seen: the most recent models that you propose in text recognition, namely MASTER and SAR, are not yet operational.
However at the text recognition level, there is a recent model that gets very impressive performances: PasrSeq.
Here are the references: https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/master/README.md https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/README.md https://paperswithcode.com/paper/scene-text-recognition-with-permuted#code https://github.com/baudm/parseq
Would it be possible to add this recognition text to the models you propose?
Thanks a lot for your work !
Motivation, pitch
I'm working with text recognition models and a recent model in state of the art outperforms on all test datasets. I would like use this model with your framework pipelines
Alternatives
No response
Additional context
No response