Open jeffreykuang opened 3 years ago
I think it will be a good idea to have colab demo / tutorial for all available features so that developers can get familiar with the package
@rakesh4real colab demo is planned in the next iteration.
@rakesh4real Hi, thanks for your great library How do you think about integrating some features like end-to-end spotting, in which the detection and recognition process are merged in a single network to learn both tasks?. Some related papers:
@huyhoang17 end-to-end spotting is one important direction of OCR. Our framework is easy to support end2end methods. We would like to reimplementing them in the future. If you are interested in doing it, welcome to send pr to this repo.
Production Deployment: ability to easily deploy on arm-based devices such as Raspberry Pi, and cpu-only devices. Benefits: ordinary people can detect and recognize text documents without coding knowledge. Requirements: optimize the models for inferencing-only environments, tensorRT, onnx, quantization, etc..
Training Documentation: introduce detailed documentation on how to label the images, train and deploy models. Requirements: simple youtube videos and github documentations.
There is no test speed / memory usage in results. Some algorithms with huge model or complicated post process are very slow . This is important while deploying algorithms.
Hope to have a online demo. So we can quickly test the images to look the ocr result.
One more end to end text spotting model. pgnet :- https://arxiv.org/pdf/2104.05458v1.pdf
Hi there, I suggest adding pre-trained models for document visual question answering (vqa). Motivation Document VQA is an important task in OCR. It recognizes texts region and finds their relationship. They are useful for processing visually rich documents, such as tables, forms, receipts, invoices. There are families of document vqa algorithms. However, they are maintained in different frameworks. It makes the comparison of downstream tasks' performance difficult. | Model | paper | source |
---|---|---|---|
LayoutXLM | https://arxiv.org/abs/2104.08836 | pytorch | |
StructuralLM | https://arxiv.org/abs/2105.11210 | Tensorflow | |
StrucTexT | https://arxiv.org/abs/2108.02923 | paddlepaddle |
Features
@cpwan Hi, thanks for your suggestion - that sounds really interesting! We'll definitely take this into our plan.
We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.
You can either:
Suggest a new feature by leaving a comment.
Vote for a feature request with 👍 or be against with 👎. (Remember that developers are busy and cannot respond to all feature requests, so vote for your most favorable one!)
Tell us that you would like to help implement one of the features in the list or review the PRs. (This is the greatest things to hear about!)