huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.11k stars 26.32k forks source link

How to improve accuracy (Model on TrOCR) #16458

Closed C108152117 closed 2 years ago

C108152117 commented 2 years ago

I use TrOCR model to generate the math expression LATEX sequence according to the handwritten math expression image . but the consequences are unsatisfactory ;below are eval/loss 、train loss and the repository :https://github.com/win5923/TrOCR-Handwritten-Mathematical-Expression-Recognition image image image

image I use CROHME 2014 to test this model but worse than above image's models

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ekdma7077 commented 1 year ago

Not exactly, but i think your pretrained_model's tokenizer is not optimized your math latex. so

you should use trained latex tokenizer. find it or collect latex data and create a new tokenizer using sentencepiece

BizmanSS commented 7 months ago

does training a new tokenizer not mess with the pretrained weights? since they use a specific tokenizer.