harvardnlp / im2markup

Neural model for converting Image-to-Markup (by Yuntian Deng yuntiandeng.com)
https://im2markup.yuntiandeng.com
MIT License
1.19k stars 214 forks source link

Experimental data set #22

Closed zhangwei2019 closed 5 years ago

zhangwei2019 commented 5 years ago

I found that some of the experiments in your paper were tested on the CROHME dataset, but the CROHME dataset is not an image format. How do you deal with it? Thanks for you.

danipozo commented 5 years ago

Hi, from the paper:

The dataset is in a different domain from our ren- dered images and is designed for stroke-based OCR. To handle these differences, we employ two extensions: (1) We convert the data to images by rendering the strokes and also augment data by randomly resizing and rotating symbols, (2) We also employ the simulated IM2LATEX - 100 K handwriting dataset to pretrain a large out-of-domain model and then fine-tune it on this CROHME dataset.

da03 commented 5 years ago

Thanks for danipozo's clarification! The IM2LATEX-100K handwritten dataset can be found here: http://lstm.seas.harvard.edu/latex/data/