naptha / tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥
http://tesseract.projectnaptha.com/
Apache License 2.0
35.25k stars 2.23k forks source link

failed to load ./ita.special-words #881

Open varvello opened 9 months ago

varvello commented 9 months ago

I'm using https://cdn.jsdelivr.net/npm/tesseract.js@5/dist/tesseract.min.js

If you create an italian worker: const worker = await Tesseract.createWorker("ita");

You'll see this error in the console failed to load ./ita.special-words

It's located at tesseract-core-simd-lstm.wasm.js:31

Cheers

Balearica commented 9 months ago

I was able to replicate this. Presumably there is some file missing from the default ita.traineddata file that exists for other languages. Luckily, it looks like this does not actually impact recognition, as everything appeared to work as normal warning message notwithstanding.

As Tesseract.js uses the same default .traineddata as the main Tesseract project (albeit possibly out of date), this is only actionable if that project has updated ita.traineddata to resolve this issue. I can look into this at some point when I have more time. In the meantime, users who want to use an alternative .traineddata file can do so by setting langPath.