naptha / tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥
http://tesseract.projectnaptha.com/
Apache License 2.0
34.09k stars 2.15k forks source link

Tesseract.js fails on nodejs when trying to package it as standalone #882

Closed reisalin-stout closed 5 months ago

reisalin-stout commented 5 months ago

Tesseract.js version 5.0.4)

Describe the bug When trying to package with pkg tesseract.js silently fails on createWorker even when including the two folders (tesseract.js and tesseract-core") separately. can't even log any error with logger/errorHandler nor in a try/catch block

I would also like to add that it probably fails cause of this error "TypeError: r.g.addEventListener is not a function" But it could also just be that I am messing trying to include the worker/tesseract and lang files separately. If I serve from my packaged app a webpage with a tesseract script it will work, I just can't seem to use it as packaged.

Device Version:

Balearica commented 5 months ago

In general, issues related to packaging are generally caused by:

  1. Paths (workerPath and/or corePath) needing to be set manually
    1. See this comment
  2. Using the browser code when targeting Node.js (or vice versa)
    1. If you are using Node.js, make sure you are using the Node.js code. If you are targeting a browser, be sure to use the browser code.
    2. This sounds like a likely cause here--addEventListener is only used in the browser version of Tesseract.js, so if you are trying to build for Node.js, then that indicates you are using the wrong version.

If the above does not answer your question, we would need a reproducible example repo to troubleshoot further.

reisalin-stout commented 5 months ago

I am not a professional coder so my code is pretty messy currently and I am working with non sharable stuff. I can say that my only code for tesseract is


 const { createWorker } = require("tesseract.js");
  console.log("creating worker");
  const worker = await createWorker("eng");
  console.log("worker loaded");
  await worker
    .recognize("https://tesseract.projectnaptha.com/img/eng_bw.png")
    .then((result) => {
      console.log("result :");
      console.log(result.data.text);
    });

I also tried creating the worker with:

        const worker = await createWorker("eng", 1, {
          corePath: "dist/core",
          langPath: "dist/lang",
          cachePath: "dist/lang",
          workerPath: "dist/worker.min.js",
          gzip: false, (also tried omitting this)
        });

tesseract-filepath

I tried requiring the "tesseract.min.js" (hence why it's in the screenshot) like I would in browser but I get hit by "TypeError: r.g.addEventListener is not a function" (kinda expected) since im not using it in browser context. The thing that leaves me puzzled is that even trying a try/catch or adding to the worker a logger/errorHandle function gives no output whatsoever

In both cases when running with node it works (i did it to check syntax and that the paths were right) but after packaging it doesnt (Also note that I run my exe in the same location as when I run it using node and I tried both packaging or excluding the pathfolders when running pkg)

Just to be sure i will add a screenshot of what I include in the external folders (maybe I misread from the faq at this link? Local Installation) I greatly appreciate your interest and help, thank you

reisalin-stout commented 5 months ago

Thank you very much! With your help and the post you tagged I was able to solve this. Leaving a verbose response if anyone else ever needs it

My code and paths ` const { createWorker } = require("tesseract.js"); logmsg("creating worker"); const worker = await createWorker("eng", 1, { workerPath: "./app/web/dist/src/worker-script/node/index.js", corePath: "./app/web/dist/core/", cachePath: "./app/web/dist/lang/", }); logmsg("worker loaded"); await worker .recognize("https://tesseract.projectnaptha.com/img/eng_bw.png") .then((result) => { logmsg("result :"); logmsg(result.data.text); });

logmsg("moving on");`

I copied the whole folder from ./node-modules/tesseract.js/src to another directory and pointed the worker path there, also included the core files as in the Local Installation FAQ and a pre downloaded eng.traineddata. Picture to show code+ relative folder structure

tesseract-local-fix