When testing, it requires ground truth labels to be specified to get reasonable results

SunLoveSheep commented 6 years ago

Hi,

I tried to use the Math-to-LaTeX Toy Example pre-trained model and test on my own equation. I use the following commands to perform testing: th src/train.lua -phase test -gpu_id 1 -load_model -model_dir model/latex -visualize \ -data_base_dir data/sample/images_processed/ \ -data_path data/sample/test_filter.lst \ -label_path data/sample/formulas.norm.lst \ -output_dir results \ -max_num_tokens 500 -max_image_width 800 -max_image_height 800 \ -batch_size 5 -beam_size 5

When I follow your provided steps and test on your test data, everything is fine. But when I change the "-data_base_dir" and "-data_path" to point to my own cropped equation (such as 9+9+8=26, all in printed font, no handwritten) and keep "-label_path" unchanged, the test output "results.txt" is still nearly same as those ground truth labels in your "formulas.norm.lst". Even I change my equations, as long as the "formulas.norm.lst" is not changed, the test output is the same. But once I change the "formulas.norm.lst" to contain the correct Latex expressions of my equations, the test output starts to make sense. How come this is the case? I suppose the model should predict labels without the assistance of ground labels, right? The labels should be used to calculate loss and distance, etc. only.

da03 commented 5 years ago

Oh really? During test the ground truth labels are only used for evaluating PPL's, and providing random labels wouldn't affect the translation results. Are you sure that you looked at the correct column (the result file is tab separated and contains both ground truth labels and predictions). I think it's very likely that you looked at the ground truth labels column, and the real predictions are really bad (neural networks are really sensitive to domain, so using the provided model with a new test domain would likely cause problems, probably you need to change the image size/resolution to improve the test results).

SunLoveSheep commented 5 years ago

Hi Yuntian,

Sorry for the stupid question lol... Yes you are right, looking at wrong column. The model fine-tuned now is working as expected. Thanks for the reply!

On Thu, 6 Dec 2018 at 12:29, Yuntian Deng notifications@github.com wrote:

Oh really? During test the ground truth labels are only used for evaluating PPL's, and providing random labels wouldn't affect the translation results. Are you sure that you looked at the correct column (the result file is tab separated and contains both ground truth labels and predictions). I think it's very likely that you looked at the ground truth labels column, and the real predictions are really bad (neural networks are really sensitive to domain, so using the provided model with a new test domain would likely cause problems, probably you need to change the image size/resolution to improve the test results).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/harvardnlp/im2markup/issues/16#issuecomment-444744669, or mute the thread https://github.com/notifications/unsubscribe-auth/APgvWrBtdENOBfYRPWS6CkqG4bDfigPpks5u2J06gaJpZM4ZAgkO .

da03 commented 5 years ago

Great! Closing this issue now.

harvardnlp / im2markup

When testing, it requires ground truth labels to be specified to get reasonable results #16