Clarifications on loss definition

Hi! thanks for your amazing work! I'm currently working on expanding it to handwritten and multiline LaTeX. However, I'm facing a bit of a struggle understanding the loss function you are using during training.

Could you help me get some more clarity on the loss you are using and how you are computing it? I see that, during training, you refer to model.data_parallel but then I don't see a specific call to a loss function computation.

Thanks again! Looking forward to your reply.

lukas-blecher / LaTeX-OCR

Clarifications on loss definition #373