robertknight / ocrs

Rust library and CLI tool for OCR (extracting text from images)
Apache License 2.0
1.2k stars 53 forks source link

Inquiry Regarding Python Support and Details on Training Models #17

Open yihong1120 opened 9 months ago

yihong1120 commented 9 months ago

Dear ocrs Contributors,

I hope this message finds you well. I've recently come across your OCR project and am thoroughly impressed with the vision and progress of ocrs. The use of machine learning to enhance OCR capabilities and the commitment to open datasets is commendable.

As a developer with a keen interest in OCR technologies, I have a couple of queries that I hope you might be able to address:

  1. Python Support: Given the extensive use of Python in the data science and machine learning communities, I was curious to know if there are any plans to introduce a Python version of ocrs. A Python API could significantly increase the accessibility of ocrs to a broader audience and facilitate integration into existing Python-based workflows. Are there any roadmaps or discussions around this that you could share?

  2. Training Models: I am eager to learn more about the neural network models that ocrs utilises. Specifically, which model architectures have been employed for the OCR tasks? Are these models based on any well-known architectures such as CNNs, RNNs, or Transformers? Furthermore, could you provide insights into the datasets used for training these models? Understanding the models' underpinnings would be incredibly beneficial for potential contributors and users looking to gauge the engine's capabilities and limitations.

I appreciate the time and effort that goes into maintaining and developing such an ambitious project. Thank you for considering my queries, and I look forward to any information you can provide.

Best regards, yihong1120

robertknight commented 9 months ago

I was curious to know if there are any plans to introduce a Python version of ocrs

Yes, bindings to support use in other languages is planned. Python and JS would be near the top of the list. See also https://github.com/robertknight/ocrs/issues/2.

I am eager to learn more about the neural network models that ocrs utilises. Specifically, which model architectures have been employed for the OCR tasks?

There are more details on the model architecture and training dataset in the ocrs-models repository's README. In short, text detection uses a CNN (U-Net type architecture) to produce a text/not-text binary mask and recognition is a CNN followed by RNN. Note the documentation in this repo is limited and I'm planning on expanding it soon.