naptha / tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥
http://tesseract.projectnaptha.com/
Apache License 2.0
35.28k stars 2.23k forks source link

Unable to set langPath to a blob url #965

Open TheWorldEndsWithUs opened 1 month ago

TheWorldEndsWithUs commented 1 month ago

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)

Describe the bug When running tesseract js in the browser, I'd like to pass the language data via a blob URL because of the restrictions of the environment the code will be running on. However, when I pass the URL to langPath it fails to load the file.

To Reproduce Steps to reproduce the behavior:

  1. Create a worker and set it's langPath property to a blob url.

Please attach any input image required to replicate this behavior.

Expected behavior A clear and concise description of what you expected to happen.

Device Version: Chrome Browser

Additional context Add any other context about the problem here.

Balearica commented 1 month ago

The argument langPath is set to a directory (either local or a CDN) that Tesseract.js should use to automatically download the correct language data from. Blobs are individual files, so it would not make sense for langPath to accept blobs.

If you do not want Tesseract to automatically download the correct data from a directory, but rather want to manually write language data to the worker, follow the instructions provided in #794.

Edit: It looks like this question was answered in #794, however that was for an older version, and the answer may no longer be applicable. Would need to think about whether this is possible with the current interface.

TheWorldEndsWithUs commented 1 month ago

I wouldn't mind using an older version as long as it supports word-level OCR and it is mostly stable. If it is possible with the newest version I would prefer that, but beggers can't be choosers. Thanks for your help, I've tried doing a bunch of experiments trying to hot replace the code in the minimized file with a blob link to download it locally, but it didn't work.

Balearica commented 3 weeks ago

The solution linked in #794 works with v4, however no longer works due to the consolidation of the createWorker, worker.initialize and worker.loadLanguage functions that occurred in v5. It should not be hard to add a feature to the current version that supports doing something similar, however this will require an update.