microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.12k stars 2.55k forks source link

DiT for Text Detection #1237

Open senthil-r-10 opened 1 year ago

senthil-r-10 commented 1 year ago

Is it possible for the model to understand curved text? If so how it is possible, in the document and published paper there is no explanation for it. Is anyone tried using the pre-train model to predict the scene text dataset?

wolfshow commented 1 year ago

@senthil-r-10, We do not have curved text in the DiT training, so currently it is not supported. But you may continue train this model to support curved text detection. For scene text, what do you mean by scene text document? Can you give some examples?

senthil-r-10 commented 1 year ago

I mean Curved text detection only, I plan to use this approach for OCR text detection for documents like Receipts and invoices. Could you update the Data Preparation link in the help document? https://mmocr.readthedocs.io/en/v0.6.0/datasets/det.html#funsd

rm-asif-amin commented 8 months ago

@wolfshow I've previously trained a DONUT model to adapt to a new language with synthetic data. Do you think same strategy might be possible with DiT for Text detection in a foreign language?

Secondly, Can't access the model checkpoints or weights listed in /dit/text_detection. Getting PublicAccessNotPermitted error for all links.