openai / requests-for-research

A living collection of deep learning problems
https://openai.com/requests-for-research
1.69k stars 609 forks source link

Im2Latex request data for OCRlib #17

Closed RimeOCRLIB closed 8 years ago

RimeOCRLIB commented 8 years ago

Good day! Can you provide formula data you need OCR in frame of Im2Latex research request? We have in development 10-years mature OCR c++ library with open code, more 500 000 page OCR in process. https://github.com/RimeOCRLIB/OCRLib It is 5th level convolutional network algorithm with grammar rules and dictionary correction open source OCR engine. We can build version for mathematical formula recognition you requested in research request page. OCRLib might be used as base for applications in mathematical linguistic, visual recognition, mathematical analysis and modelling. Best regards Alexander Stroganov gomde@mail.ru www.buddism.ru/ocrlib

ilyasu123 commented 8 years ago

It would be nice to obtain handwritten formulas with their latex "sources" but these are obviously hard to get, so we're using latex formulas that are compiled from latex sources. Also the end goal of this request for research is to provide an environment for ML enthusiasts to practice implementing attention and sequence models where the result can be interesting.

RimeOCRLIB commented 8 years ago

Thank you for reply. We can build OCRLib implementation for OCR the mathematical formula into LaTex. We use OCRLib for OCR tibetan and another hieroglyphic texts into Unicode. So mathematical formula recognition is one of the possibilities. All we need to know did you have interest to that result. It is need about one week work to build LaTex rules and train neural network on formula examples.

ilyasu123 commented 8 years ago

I am grateful for the offer but there is no need to do so, unless you feel that you'll enjoy this exercise. This particular request for research is not something that we want built -- it is something that will help ML enthusiast become more proficient in ML.