karpathy / neuraltalk2

Efficient Image Captioning code in Torch, runs on GPU
5.51k stars 1.26k forks source link

Is Neural Talk2 a potential candidate for Optical Formula Recognition? #125

Closed ghost closed 8 years ago

ghost commented 8 years ago

Thanks for the great project, apologize if this is not the right place to ask.

I'm looking for the state of art of Optical Formula Recognition, it seems there is no very good result at the time, I wonder if neural network could bring breakthrough in this area.

Assume we have enough labelled training data, with images of formula like:

formula1

or

formula2

And the "captions" of the images which are actually valid MathML syntax source code like:

    <mrow>
      a &InvisibleTimes; <msup>x 2</msup>
      + b &InvisibleTimes; x
      + c
    </mrow>

(skip)

After training, could we expect a Neural Talk liked model generate valid MathML description for mathematical formula image?

According to https://github.com/karpathy/char-rnn we know that neural network has the potential to generate (almost) valid syntax in XML/Latax/HTML/C/etc, it would be cool to use neural network to convert images to valid formulas. If that works, we are one more step closing to convert informal mathematical articles to formal mathematics articles for theorem proven tools like Isabelle/Mizar/Coq, which rapidly grows the database of formal mathematics theorem like [1], as a result it will help the research of machine learning based theorem proven like [2].

Any insight is great appreciated, thank you!

[1] https://www.isa-afp.org/ [2] https://arxiv.org/abs/1606.04442

fantine16 commented 8 years ago

Do you train it on rnn ?

ghost commented 8 years ago

I haven't done any experience with Neural Talk + MathML yet, what I did before is train char-rnn with C source code and it generates some interesting C code with a few syntax error. @fantine16 any thought is appreciated :)

ghost commented 8 years ago

Hi folks,

Answer my own questions here: for people who interesting in mathematics expression OCR, please have a look at [1] . The authors tried a "Show, attend and Tell" model as well as an enhanced version with an extra RNN network, which shows very positive result.

This task is now one of "Request for research" topic of Open AI [1], hopefully more and more people will work on it.

I'm closing this issue since the question is answered by [1]. Anyone who is interesting on the topic feel free to mail me for further discussion :)

Thanks karpathy for the original work which inspired such a lot new result!

[1] http://lstm.seas.harvard.edu/latex/ [2] https://openai.com/requests-for-research/#im2latex