Closed ZhigangPu closed 5 years ago
Hi Zhigang, where are the test images from? Since the pretrained model was trained on latex rendered in a vanilla setting, anything out-of-domain wouldn't work (likely). To get a model that can recognize any picture in the world, we need to add distortions and artifacts to the training data (via data augmentation), or include handwritten data (as Mathpix did), then the trained model can work under various settings.
Hi Zhigang, where are the test images from? Since the pretrained model was trained on latex rendered in a vanilla setting, anything out-of-domain wouldn't work (likely). To get a model that can recognize any picture in the world, we need to add distortions and artifacts to the training data (via data augmentation), or include handwritten data (as Mathpix did), then the trained model can work under various settings.
Thanks for replying! Test images are screenshots from arbitrary sources like paper, book or images from google results. There's little noise. And may I ask, have you tested the model on these sources before and how it behaved?
Thanks to your reminding of data augmentation, I'll think this way.
Oh, that's why. I tried on screenshots before and they didn't work well. However, I'm pretty sure if you include those variants in the training set it would work, as shown by Mathpix.
Got high generalization error when predicting using latex formula picture in real word, for example, below is a predict for one formula picture:
\begin{array} { c c } { { { { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } &
And this is my training result:
EM 14.03 - BLEU-4 74.61 - perplexity -1.42 - Edit 78.67
Has someone stuck in the same situation as me?
hello,have you solved this problem?I have the same problem as yours
Hi @hengyeliu this is a normal behavior of neural network based approaches. The released model is only pretrained on a particular rendering of LaTeX symbols, so it is unrobust against noise at all. To make it work for real formulas, you need to add noise during training as well.
Hi @hengyeliu this is a normal behavior of neural network based approaches. The released model is only pretrained on a particular rendering of LaTeX symbols, so it is unrobust against noise at all. To make it work for real formulas, you need to add noise during training as well.
Thanks for your reply, I will try your suggestion
Got high generalization error when predicting using latex formula picture in real word, for example, below is a predict for one formula picture:
\begin{array} { c c } { { { { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } &
And this is my training result:
EM 14.03 - BLEU-4 74.61 - perplexity -1.42 - Edit 78.67
Has someone stuck in the same situation as me?