lukas-blecher / LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.
https://lukas-blecher.github.io/LaTeX-OCR/
MIT License
12.59k stars 1.03k forks source link

Completely unusable #33

Closed itewqq closed 3 years ago

itewqq commented 3 years ago

At first when I tried a complicated formula, it would get stuck for a long time and then returned the wrong result.

Later, I found that even the simplest case is not recognized by this program.

latexocr

lukas-blecher commented 3 years ago

Hello, I'm sorry for the difficult first contact with the program. I'll try to explain what's happening.

First regarding the complicated formula. Could you maybe show me what the input image was? Sometimes the network is suck in an feedback loop where it predicts the same token over and over again. In the end the result is most likely wrong. I will implement an early stop condition in the future. Right now I can only recommend to try again, maybe with a different temperature, or to split the image up into multiple smaller parts.

Now, regarding the simple predictions. I am aware that very small images do not work as of right now. That is due to a mistake during training where I ommited small images from the dataset. In practice that means only images with width >96px work as expected (if I recall correctly). I'm sure if you try with somewhat larger images the result will be as expected. Note, the \LaTeX command is no math typesetting and thus not in the trainingset, which is why that also didn't work.

Please let me know if you can verify what I've written down. Thanks

itewqq commented 3 years ago

@lukas-blecher Thanks for the reply! My issue's title may be kind of rude but it's really sad that seems everybody else can make it work except me... In fact I am very grateful for the work you've done here.

For the complicated formula, it is just the one shown in my above gif, which is a part of the README file of this repo.

For the size issue you mentioned, I just tested another bigger snip bug it still not working: latexocr

lukas-blecher commented 3 years ago

Ok that should definitely not happen. On my machine there is no problem recognizing the formula whatsoever. Can you please provide some additional information?

itewqq commented 3 years ago

@lukas-blecher

  1. Windows 10 20H2 19042.1165

  2. Yes, I downloaded the weights.pth and image_resizer.pth mentioned in the README and put them in the checkpoints folder: image

  3. I install the dependencies in a conda env, and my package info:

    (latexocr) PS C:\Users\hp\Desktop\Web\LaTeX-OCR> pip list
    Package                Version
    ---------------------- -------------------
    albumentations         1.0.3
    certifi                2021.5.30
    chardet                4.0.0
    charset-normalizer     2.0.4
    click                  8.0.1
    colorama               0.4.4
    cycler                 0.10.0
    einops                 0.3.2
    entmax                 1.0
    filelock               3.0.12
    huggingface-hub        0.0.16
    idna                   3.2
    imageio                2.9.0
    imagesize              1.2.0
    joblib                 1.0.1
    kiwisolver             1.3.2
    matplotlib             3.4.3
    munch                  2.5.0
    networkx               2.6.2
    numpy                  1.21.2
    opencv-python-headless 4.5.3.56
    packaging              21.0
    pandas                 1.3.2
    Pillow                 8.3.2
    pip                    21.0.1
    pynput                 1.7.3
    pyparsing              2.4.7
    PyQt5                  5.15.4
    PyQt5-Qt5              5.15.2
    PyQt5-sip              12.9.0
    PyQtWebEngine          5.15.4
    PyQtWebEngine-Qt5      5.15.2
    python-dateutil        2.8.2
    python-Levenshtein     0.12.2
    pytz                   2021.1
    PyWavelets             1.1.1
    PyYAML                 5.4.1
    regex                  2021.8.28
    requests               2.26.0
    sacremoses             0.0.45
    scikit-image           0.18.3
    scipy                  1.7.1
    screeninfo             0.7
    setuptools             52.0.0.post20210125
    six                    1.16.0
    tifffile               2021.8.30
    timm                   0.4.12
    tokenizers             0.9.4
    torch                  1.9.0
    torchtext              0.10.0
    torchvision            0.10.0
    tqdm                   4.62.2
    transformers           4.2.2
    typing-extensions      3.10.0.2
    urllib3                1.26.6
    wheel                  0.37.0
    wincertstore           0.2
    x-transformers         0.17.12
lukas-blecher commented 3 years ago

Thank you very much. I was able to replicate the issue. Turns out it was in the dependencies after all. Namely the x-transformers I'll work on a fix. In the meantime you can downgrade to a previous version

pip install -U x-transformers==0.12.1

That should work. Thanks again for brining the issue to my attention

itewqq commented 3 years ago

Oh it's working fine after the downgrade... Thank you!!! Hope you can fix the bug soon : P

lukas-blecher commented 3 years ago

I fixed the version in the requirements for now. I might come back to it at some later point but for right now that is sufficient.