Closed abhishekthanki closed 3 years ago
Following.
One difference I notice is that there is no FMA found on Ubuntu 20.4, so the hardware is different.
You could try with latest code from master branch on all machines to see if that makes a difference.
On Fri, Aug 21, 2020, 19:40 Abhishek Thanki notifications@github.com wrote:
Hello,
I'm fine-tuning tesseract on a custom dataset. I have been able to do this successfully without any issues however when it came to reproduce the results on various systems, I was unable to do so. When using the same dataset and parameters, I get ~23% accuracy on Ubuntu 18.04 and macOS 10.15.6. But on Ubuntu 20.04, I get ~46% accuracy. I'm wondering why there is such a huge difference in accuracy.
The following are the tesseract version details of all three systems:
- macOS 10.15.6:
tesseract 4.1.1-rc2-25-g9707 leptonica-1.80.0 libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1 Found AVX2 Found AVX Found FMA Found SSE
- Ubuntu 18.04:
tesseract 4.1.1-rc2-25-g9707 leptonica-1.79.0 libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 Found AVX2 Found AVX Found FMA Found SSE
- Ubuntu 20.04:
tesseract 4.1.1-rc2-25-g9707 leptonica-1.79.0 libgid 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : lipopenjp2 2.3.1 Found AVX2 Found AVX Found SSE
As you can see, there is quite a bit difference in versions of libraries tesseract depends on. Could that be the reason why the results are not reproducible?
Please note that: The same instructions were followed on all systems (except minor changes made on macOS for obvious reasons).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesstrain/issues/188, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABG37I3GSFL7XJVHDSPYRTLSBZ55DANCNFSM4QHKNOFQ .
I also suggest to repeat the test with latest Tesseract. Reproducible training results are very important. Are the results the same when you repeat the training on the same machine?
@Shreeshrii That's the case because Ubuntu 20.04 is being run on a VM using VirtualBox (which does not support FMA). Do you think that could be the root cause of this issue?
@Shreeshrii @stweil I tried with the latest version of Tesseract and I'm getting the same results as before.
@abhishekthanki I fear that if you want a more concrete answer you will have to provide some sample output and the command you used for training at least. It would be perfect if you could provide a (minimal) data set which leads to the odd behavior.
No further progress. Most likely a tesseract problem. Closing.
Hello,
I'm fine-tuning tesseract on a custom dataset. I have been able to do this successfully without any issues however when it came to reproduce the results on various systems, I was unable to do so. When using the same dataset and parameters, I get ~23% accuracy on Ubuntu 18.04 and macOS 10.15.6. But on Ubuntu 20.04, I get ~46% accuracy. I'm wondering why there is such a huge difference in accuracy.
The following are the tesseract version details of all three systems:
As you can see, there is quite a bit difference in versions of libraries tesseract depends on. Could that be the reason why the results are not reproducible?
Please note that: The same instructions were followed on all systems (except minor changes made on macOS for obvious reasons).