Closed zhangwei2019 closed 5 years ago
Hi, from the paper:
The dataset is in a different domain from our ren- dered images and is designed for stroke-based OCR. To handle these differences, we employ two extensions: (1) We convert the data to images by rendering the strokes and also augment data by randomly resizing and rotating symbols, (2) We also employ the simulated IM2LATEX - 100 K handwriting dataset to pretrain a large out-of-domain model and then fine-tune it on this CROHME dataset.
Thanks for danipozo's clarification! The IM2LATEX-100K handwritten dataset can be found here: http://lstm.seas.harvard.edu/latex/data/
I found that some of the experiments in your paper were tested on the CROHME dataset, but the CROHME dataset is not an image format. How do you deal with it? Thanks for you.