microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.22k stars 2.45k forks source link

End-to-End OCR with LayoutReader #606

Open shreyas90999 opened 2 years ago

shreyas90999 commented 2 years ago

Describe Model I am using (LayoutReader)

Thanks for open-sourcing such amazing work!!! I want to do OCR on invoices, resumes.... structured documents. Could you please help me out on how can I achieve this using the LayoutReader model? If you can share scripts or approaches on how this can be done it would be helpful. I am looking for more end-to-end things like taking images and giving output as OCR does. If you can also add on how the model can be fine-tuned it would be helpful full

zlwang-cs commented 2 years ago

Thanks for your interest in our work! It is still an open question to do the doc AI in an end-to-end way. It must be a very promising direction and we believe LayoutReader can be a really helpful part of such a pipeline. So please follow up with our works and see whether you can find some potential direction to conquer the task.