Closed laurent22 closed 1 week ago
Was this a one-time thing that was resolved once you deleted/refreshed the cache data, or can it be replicated? If it can be replicated, please provide a reproducible example.
I couldn't find where it stores the cache and setting langPath didn't seem to have any effect. Where can I find the cache data? For now I have disabled the cache but if I enable it again I think it will happen again, and then I can share these cached files so that the bug can be replicated
Files are cached at ${cachePath}/${lang}.traineddata
, where cachePath
is determined by the cachePath
argument (.
by default). For the browser version of Tesseract.js the file is cached in IndexDB
, and for the Node.js version of Tesseract.js the file is cached on the local file system.
For example, the following snippet will download eng.traineddata
from IndexDB on browser. It must be run from the devtools console on a website that has previously saved eng.traineddata
to the cache.
(async () => {
// Open a connection to the database
const openRequest = indexedDB.open('keyval-store');
const db = await new Promise((resolve, reject) => {
openRequest.onerror = () => reject(openRequest.error);
openRequest.onsuccess = () => resolve(openRequest.result);
});
// Start a transaction and get the object store
const transaction = db.transaction(['keyval'], 'readonly');
const store = transaction.objectStore('keyval');
// Use the key to get the file as a Blob
const getRequest = store.get('./eng.traineddata');
const data = await new Promise((resolve, reject) => {
getRequest.onerror = () => reject(getRequest.error);
getRequest.onsuccess = () => resolve(getRequest.result);
});
const blob = new Blob([data], {type: 'application/octet-stream'});
// Create a URL for the blob
const url = URL.createObjectURL(blob);
// Create a temporary anchor element to trigger download
const a = document.createElement('a');
a.href = url;
a.download = 'eng.traineddata';
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
// Revoke the blob URL after download
URL.revokeObjectURL(url);
})();
@laurent22 To follow up, were you ever able to replicate this issue in a reproducible way and/or figure out what you think the root cause is?
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
5.0.4
Describe the bug
This is the same issue as https://github.com/naptha/tesseract.js/issues/414, which normally should have been addressed with the
errorHandler
property but not in all cases it seems. I'm using Tesseract.js with Electron and it get stuck at the message{ workerId: "Worker-0-ac418", status: "loading language traineddata", progress: 0 }
I set the
errorHandler
property but it's never triggered.Using the "lazy fox" default image.
And the same fix as mentioned in the other issue, setting
cacheMethod: 'none'
works, but I'd rather keep the cache enabled since downloading 10 MB every time wouldn't make sense.Edit:
I've just discovered that Tesseract.js has a second way to log using
Tesseract.setLogging
so I set that totrue
but it didn't help. It just prints[Worker-0-e9fc5]: Start Job-1-4ae93, action=loadLanguage
followed by the dreadedloading language traineddata
message.Device Version: