Open decadance-dance opened 3 weeks ago
Hi @decadance-dance :wave:,
Have you already tried: docTR: https://huggingface.co/Felix92/doctr-torch-parseq-multilingual-v1 OnnxTR: https://huggingface.co/Felix92/onnxtr-parseq-multilingual-v1 ? :)
Depends a bit if there is any data from mindee we could use. Question goes to @odulcy-mindee ^^
Hi, @felixdittrich92 I used docTR more than half year but have never faced this multilingual model, lol. So, I am gonna try it, thanks.
Ah let's keep this issue open there is more todo i think :)
Hi, @felixdittrich92 I used docTR more than half year but have never faced this multilingual model, lol. So, I am gonna try it, thanks.
Happy about an feedback how it works for you :) The model was fine tuned only on synth data.
Depends a bit if there is any data from mindee we could use. Question goes to @odulcy-mindee ^^
Unfortunately, we don't have such data
@decadance-dance For training such recognition models i don't see a problem.. we can generate synth train data and need in a best case only real val samples. But for detection we would need real data that's the main issue.
In general we would need the help of the community to collect documents (newspaper, receipt photos, etc.) in divers langauges (can be unlabeled). / This would need a license to sign that we can freely use this data. With enough divers data we could use Azure Doc AI for example to pre-label this data. Later on i wouldn't see an issue to open source this dataset.
But not sure how to trigger such "event" :sweat_smile: @odulcy-mindee
Hello =) I found some public dataset for various tasks english documents mathematics documents latex ocr latex ocr chinese ocr chinese ocr chinese ocr
Moreover it should be interesting for Chinese detection models to add multiple recognition data in the same image without intersection. This should help for a Chinese detection model to perform better without real detection data. Anyone interested in creating random multilingual data for detection models (hindi, chinese, etc.) ?
Hi @nikokks 😃 Recognition should not be such a big deal i found already a good way to generate such data for fine tuning.
To collect multilingual data for detection is troublesome because it should be real data (or if possible really good generated ones / for example with a fine tuned FLUX model maybe !?) We need different kinds of layouts/documents (newspapers, invoices, receipts, cards, etc.) so the data should come close to real use cases (not only scans also document photos etc.) :)
🚀 The feature
Support of multiple languages (accordingly VOCABS["multilingual"]) by pretrained models.
Motivation, pitch
It would be great to use models which supports multiple languages because it significantly improve user experience in various cases.
Alternatives
No response
Additional context
No response