Closed itewqq closed 3 years ago
Hello, I'm sorry for the difficult first contact with the program. I'll try to explain what's happening.
First regarding the complicated formula. Could you maybe show me what the input image was? Sometimes the network is suck in an feedback loop where it predicts the same token over and over again. In the end the result is most likely wrong. I will implement an early stop condition in the future. Right now I can only recommend to try again, maybe with a different temperature, or to split the image up into multiple smaller parts.
Now, regarding the simple predictions. I am aware that very small images do not work as of right now. That is due to a mistake during training where I ommited small images from the dataset. In practice that means only images with width >96px work as expected (if I recall correctly). I'm sure if you try with somewhat larger images the result will be as expected. Note, the \LaTeX command is no math typesetting and thus not in the trainingset, which is why that also didn't work.
Please let me know if you can verify what I've written down. Thanks
@lukas-blecher Thanks for the reply! My issue's title may be kind of rude but it's really sad that seems everybody else can make it work except me... In fact I am very grateful for the work you've done here.
For the complicated formula, it is just the one shown in my above gif, which is a part of the README file of this repo.
For the size issue you mentioned, I just tested another bigger snip bug it still not working:
Ok that should definitely not happen. On my machine there is no problem recognizing the formula whatsoever. Can you please provide some additional information?
image_resizer.pth
?@lukas-blecher
Windows 10 20H2 19042.1165
Yes, I downloaded the weights.pth
and image_resizer.pth
mentioned in the README and put them in the checkpoints folder:
I install the dependencies in a conda env, and my package info:
(latexocr) PS C:\Users\hp\Desktop\Web\LaTeX-OCR> pip list
Package Version
---------------------- -------------------
albumentations 1.0.3
certifi 2021.5.30
chardet 4.0.0
charset-normalizer 2.0.4
click 8.0.1
colorama 0.4.4
cycler 0.10.0
einops 0.3.2
entmax 1.0
filelock 3.0.12
huggingface-hub 0.0.16
idna 3.2
imageio 2.9.0
imagesize 1.2.0
joblib 1.0.1
kiwisolver 1.3.2
matplotlib 3.4.3
munch 2.5.0
networkx 2.6.2
numpy 1.21.2
opencv-python-headless 4.5.3.56
packaging 21.0
pandas 1.3.2
Pillow 8.3.2
pip 21.0.1
pynput 1.7.3
pyparsing 2.4.7
PyQt5 5.15.4
PyQt5-Qt5 5.15.2
PyQt5-sip 12.9.0
PyQtWebEngine 5.15.4
PyQtWebEngine-Qt5 5.15.2
python-dateutil 2.8.2
python-Levenshtein 0.12.2
pytz 2021.1
PyWavelets 1.1.1
PyYAML 5.4.1
regex 2021.8.28
requests 2.26.0
sacremoses 0.0.45
scikit-image 0.18.3
scipy 1.7.1
screeninfo 0.7
setuptools 52.0.0.post20210125
six 1.16.0
tifffile 2021.8.30
timm 0.4.12
tokenizers 0.9.4
torch 1.9.0
torchtext 0.10.0
torchvision 0.10.0
tqdm 4.62.2
transformers 4.2.2
typing-extensions 3.10.0.2
urllib3 1.26.6
wheel 0.37.0
wincertstore 0.2
x-transformers 0.17.12
Thank you very much. I was able to replicate the issue.
Turns out it was in the dependencies after all. Namely the x-transformers
I'll work on a fix. In the meantime you can downgrade to a previous version
pip install -U x-transformers==0.12.1
That should work. Thanks again for brining the issue to my attention
Oh it's working fine after the downgrade... Thank you!!! Hope you can fix the bug soon : P
I fixed the version in the requirements for now. I might come back to it at some later point but for right now that is sufficient.
At first when I tried a complicated formula, it would get stuck for a long time and then returned the wrong result.
Later, I found that even the simplest case is not recognized by this program.