naptha / tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥
http://tesseract.projectnaptha.com/
Apache License 2.0
35.31k stars 2.23k forks source link

Uncaught NetworkError: Failed to execute 'importScripts' on 'WorkerGlobalScope': #851

Closed dpmylove closed 10 months ago

dpmylove commented 1 year ago

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Create method in file in a js file.: import { createWorker } from "tesseract.js"; const IS_BROWSER = typeof window !== 'undefined' && typeof window.document !== 'undefined'; const OPTIONS = { cachePath: './tests/assets/traineddata', corePath: '../node_modules/tesseract.js-core', ...(IS_BROWSER ? { workerPath: '../../../node_modules/tesseract.js/dist/worker.min.js' } : {}), }; let worker; async function performOCR(imagePath) { const image = path.resolve(__dirname, imagePath || './ocr_demo.png'); console.log(Recognizing ${image}); worker = await createWorker('eng',1, OPTIONS);

    await worker.reinitialize('eng'); // Perform OCR const { data: { text } } = await worker.recognize(image); console.log(text);

    // Terminate the worker await worker.terminate();

    return text; }

module.exports = { performOCR, };

  1. create method in command.js const { performOCR } = require('../../cypress/e2e/demo/ocrUtils');

Cypress.Commands.add('performOCR', (imagePath) => { return performOCR(imagePath); });

  1. create a test case in Cypress automation framework to use this method.
  2. it.only('should perform OCR on an image', () => { // Call the performOCR function with the image path cy.performOCR( './in_img.jpg').then((ocrText) => { // Use the OCR text in your Cypress test cy.log(OCR Text: ${ocrText}); }); });

Please attach any input image required to replicate this behavior.

performOCR./in_img.jpg (uncaught exception)Error: Uncaught NetworkError: Failed to execute 'importScripts' on 'WorkerGlobalScope': The script at 'http://localhost:57232/node_modules/tesseract.js/dist/worker.min.js' failed to load. Error Uncaught NetworkError: Failed to execute 'importScripts' on 'WorkerGlobalScope': The script at 'http://localhost:57232/node_modules/tesseract.js/dist/worker.min.js' failed to load.

Expected behavior String on image could be recognized.

Device Version:

Additional context No matter set the workerPath to local or a CDN path, it's failed with above error. Once upon a time, tesseract.js could work with Cypress automation framework normally, now it's not working.

Balearica commented 1 year ago

Please complete the first part of the issue template asking what version of Tesseract.js you are using.

Additionally, please post the error message (if any) that occurs when you leave workerPath as the default value.

The NetworkError error appears to be fairly literal--indicating that the workerPath argument is not pointing to a valid worker.min.js file. Presumably http://localhost:57232/node_modules/tesseract.js/dist/worker.min.js does not exist on your local site. Therefore, I would expect the solution would be editing your workerPath or Cypress configuration.

I have personally used Cypress with the latest version of Tesseract.js for another project, so am disinclined to believe that Tesseract.js is incompatible with Cypress.

dpmylove commented 1 year ago

I am using latest version teseracct.js v5.0.3, below is the error while I am using default configuration. btw, I can access http://localhost:57232/node_modules/tesseract.js/dist/worker.min.js in browser. so I believe the file is exist on local.

(uncaught exception)Error: Uncaught NetworkError: Failed to execute 'importScripts' on 'WorkerGlobalScope': The script at 'https://cdn.jsdelivr.net/npm/tesseract.js@v5.0.3/dist/worker.min.js' failed to load. (uncaught exception)TypeError: Cannot create property 'isPending' on string 'Uncaught NetworkError: Failed to execute 'importScripts' on 'WorkerGlobalScope': The script at 'https://cdn.jsdelivr.net/npm/tesseract.js@v5.0.3/dist/worker.min.js' failed to load.'

Balearica commented 1 year ago

I do not know why importScripts would throw network errors when provided a valid path. You can try setting workerBlobURL: false in the createWorker options. This bypasses the importScripts statement that loads the worker.min.js file, however importScripts is still used to download Tesseract.js-core, so that may simply lead to a different error message being thrown.

If the issue persists and you want me to look further, I would need to be provided a reproducible example repo that I can clone and run. As noted above, I am able to use Cypress with the latest version of Tesseract.js on my system and it works as expected.

Balearica commented 1 year ago

Also, you should delete the worker.reinitialize statement. createWorker returns a worker that is already initialized in the language that you specify. The function worker.reinitialize re-initializes the worker with a new language/OEM--it only needs to be run if you want to switch from English to Chinese (for example) with an existing worker.

dpmylove commented 1 year ago

thanks for responding, after I change setting workerBlobURL: false, the address changed to an invalid address below. 'http://localhost:59744/__cypress/node_modules/tesseract.js-core/tesseract-core-simd-lstm.wasm.js' failed to load. I try to use a absolute path, the scripts failed with following error. Failed to construct 'Worker': Access to the script at 'file:///C:/Users/tdeng/Documents/tesseractjs/node_modules/tesseract.js/dist/worker.min.js' is denied by the document's Content Security Policy.

Balearica commented 1 year ago

Browsers are categorically disallowed from accessing files on your local file system--any path starting with file:// will not work in a browser for security reasons. Additionally, you generally don't want to be using paths relative to the current file (i.e. paths starting with ./ or ../) for langPath or workerPath as these will be broken if the script is run out of a different directory (which is presumably what Cypress is doing).

Instead, if you want to always access the node_modules directory regardless of where the script is being run, you would do /node_modules/tesseract.js-core, which will always resolve to [domain]/node_modules/tesseract.js-core.

dpmylove commented 1 year ago

have tried according to above suggestions, I am not able to make script working in Cypress environment, so we can get the conclusion that Tesseract.js is incompatible with Cypress, right? if so, do you have plan to make tesseract.js officially support Cypress testing, which I think it will do great help to visualization testing. thanks!

dpmylove commented 1 year ago

especially, if there are canvas components, to verify message in canvas, tesseract.js should make a big difference.

Balearica commented 1 year ago

As I stated above, Tesseract.js can be used with Cypress. I personally use Cypress with a project that uses the latest version of Tesseract.js.

It sounds like there is something wrong with your testing environment, your Cypress configuration, or the specific version of Cypress you are using is bugged. The error you report above is that importScripts([path]) is failing, despite path being valid. This cannot be a bug with Tesseract.js.

If you create a repo with a minimal reproducible with Cypress + Tesseract.js, then I can clone and look into this further. However, without something I can clone and run, there is nothing further I can do to help with this.

dpmylove commented 1 year ago

I zipped my project, and the issue could be reproduced, hope this can help to figure out the problem. [Uploading tesseractjsdebug.zip…]()

dpmylove commented 12 months ago

seems like the zipped file has problem, I have upload my cypress code to following github location, could you pls have a look? thanks! https://github.com/dpmylove/ShareForDebugTesseractJS

dpmylove commented 12 months ago

find info for the error: Refused to load the script 'http://localhost:63292/node_modules/tesseract.js/dist/worker.min.js' because it violates the following Content Security Policy directive: "script-src 'unsafe-eval'". Note that 'script-src-elem' was not explicitly set, so 'script-src' is used as a fallback. so how to avoid this restriction?

Balearica commented 12 months ago

I looked at your repo, and it does not look like you have a website for Cypress to test. Rather, it looks like you're trying to run recognition within the .cy.js files defining the test specifications. This will not work, and would not be desirable to do even if it did work, as it would mean that no separate application is being tested.

Cypress should be used to test a website. The JavaScript code that contains the Cypress tests should only contain the testing code. It should not attempt to run application code (in this case, recognition).

Separate from the testing code, you should have a website that recognizes an image using Tesseract.js. The examples in this repo can be used for a basic example. Then, you can write a Cypress script that interacts with the website to run recognition.

dpmylove commented 12 months ago

I actually run this to test a web and encounter problems, in order to make it simple, so I create this repo. This error happens while in tesseract initial state. So I think not test web is not the reason. anyway, I will update this repo to test web later.

Balearica commented 10 months ago

Closing this issue as I believe Tesseract.js does work with Cypress and any issues are due to a particular project configuration. We can reopen if a reproducible example repo of Cypress failing to work with the basic example code is provided.