naptha / tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥
http://tesseract.projectnaptha.com/
Apache License 2.0
34.91k stars 2.21k forks source link

Execution `worker.recognize` repeatedly causes "Out of Memory" error in JSFiddle #920

Closed horihiro closed 5 months ago

horihiro commented 5 months ago

If there are some non-appropreate description, point out.

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo) I beleave the latest version is used because Tesseract is included by the following script tag

<script src='https://cdn.jsdelivr.net/npm/tesseract.js@5/dist/tesseract.min.js'></script>

Describe the bug When worker.recognize is executed repeatedly, Out of Memory occurs.

To Reproduce Steps to reproduce the behavior:

  1. Go to 'https://jsfiddle.net/pgr36sct/5/'
    In this page, worker.recognize executes repeatedly by using requestAnimationFrame and output the result to the console
  2. Wait until the number of output lines on console at the bottom window reaches 30 image
  3. In many case, See Out of Memory error as the following screenshot before the number reaches 30.
    image

Expected behavior I expected the error doesn't occur and the repeatation of worker.recognize can continue.

Device Version:

Additional context Add any other context about the problem here.

Balearica commented 5 months ago

Edit: This explanation is not the root cause for this user, however it may be useful for other users experiencing an 'out of memory' error. See comments below.

Copying the code from the JSFiddle below for the benefit of other users, as opening will indeed freeze/crash the page.

let worker
let i=0;
const x = 50;
const image = document.querySelector('img');
async function OCRImageByTesseract() {
  i++;
  if (i%x==0) {
    worker = worker || await Tesseract.createWorker('eng');
    const result = await worker.recognize(image.src);
    console.log(i/x, result)
  }
    requestAnimationFrame(OCRImageByTesseract)
}
// loop start;
requestAnimationFrame(OCRImageByTesseract);

Short answer: I believe this would be resolved by switching to using a scheduler rather than using worker.recognize. The basic syntax for schedulers is explained here, and there is a scheduler example in the examples directory.

Longer answer: I believe this issue is due to the fact that this code sends new jobs to the worker before the previous job is completed. Workers have no mechanism for queuing jobs--workers were written with the assumption that a new worker.recognize function would not be run until the previous call to worker.recognize completed. Support for running jobs asynchronously and/or in parallel was added later with the addition of schedulers. As a result, Tesseract.js behaves in unexpected and undesirable ways when this is not the case. This was recently discussed in #875.

horihiro commented 5 months ago

Thank you @Balearica Let me confirm one thing.

Doesn't the below code using await wait until finishing worker.recognize though the return value is assigned to result?

const result = await worker.recognize(image.src);

What I want to do is just executing worker.recognize repeatedly, not parallel execution.

Balearica commented 5 months ago

You're right, my original explanation was incorrect. I was unfamiliar with the requestAnimationFrame function, however it looks like calling that function is the equivalent of just calling OCRImageByTesseract once. Therefore, this snippet is waiting for worker.recognize to finish before running it again.

I do not know why this code is causing the page to crash in JSFiddle, however I now suspect the issue is with JSFiddle rather than Tesseract.js. I was unable to replicate this issue outside of JSFiddle, even when copy/pasting the exact code from the JSFiddle that crashes.

If you are able to replicate this problem using a standard web server, please create a repo with a reproducible example, or alternatively paste an HTML snippet that can be run as a single-file site, and I can look into it further. If the issue cannot be replicated anywhere outside of JSFiddle, then the issue should be raised with that project.

horihiro commented 5 months ago

Thank you @Balearica ! I will check if the issue can be reproducible except on JSFiddle

horihiro commented 5 months ago

I checked same code on CodePen, but this issue cannot be reproducible. So this might be depends on JSFiddle as you suspected.