robertknight / tesseract-wasm

JS/WebAssembly build of the Tesseract OCR engine for use in browsers and Node
https://robertknight.github.io/tesseract-wasm/
BSD 2-Clause "Simplified" License
248 stars 26 forks source link

Failed to execute 'postMessage' on 'MessagePort': # could not be cloned. #83

Closed gmlloves closed 1 year ago

gmlloves commented 1 year ago

Hello,

I am woking on a Blazor Webassembly app and I'd like to use your library. I tried to modify your example, build it and copy the files (ocr-app.bundle.js, tesseract-core.wasm, tesseract-core-fallback.wasm, tesseract-worker.js) to my project but I got the error:

Failed to execute 'postMessage' on 'MessagePort': # could not be cloned. Error: Failed to execute 'postMessage' on 'MessagePort': # could not be cloned. at ocr-app.bundle.js:256 at new Promise () at requestResponseMessage (tesseract.js:244) at Object.apply (tesseract.js:180) at OCRClient.loadImage (tesseract.js:346) at async runOCR (ocr.js:34)

https://github.com/robertknight/tesseract-wasm/blob/main/examples/web/ocr-app.js

import { OCRClient } from "tesseract-wasm";
export function OcrClient() { return new OCRClient() };

Index.html (my project)

  <script type="module">
      import { OcrClient } from "./js/ocr-app.bundle.js";
      window.OcrClient = OcrClient;
  </script>

ocr.js (my project)

async function runOCR(base64String) {
    let start = new Date();

    // Initialize the OCR engine. This will start a Web Worker to do the
    // work in the background.
    const ocr = OcrClient();
    const imageResponse = await fetch(base64String);
    const imageBlob = await imageResponse.blob();
    const image = await createImageBitmap(imageBlob);

    try {
        // Load the appropriate OCR training data for the image(s) we want to
        // process.
        await ocr.loadModel('js/eng.traineddata.js');
        await ocr.loadImage(createImageBitmap(image));

        // Perform text recognition and return text in reading order.
        const text = await ocr.getText();

        console.log('OCR text: ', text);
    } finally {
        // Once all OCR-ing has been done, shut down the Web Worker and free up
        // resources.
        ocr.destroy();
    }

    let end = new Date();
    console.log((end - start) / 1000);
    return text;
}

Can you help me? Is there a plain script tags (maybe with a CDN) to include it?

Thank you in advance.

robertknight commented 1 year ago

The return value of createImageBitmap is a promise. The error is occurring on this line because you are passing a Promise<ImageBitmap> instead of an ImageBitmap to ocrClient.loadImage:

await ocr.loadImage(createImageBitmap(image));

Here image is already an ImageBitmap because it came from await createImageBitmap(...) above.

Regarding this line:

 await ocr.loadModel('js/eng.traineddata.js');

Is this file exactly the same as https://raw.githubusercontent.com/tesseract-ocr/tessdata_fast/main/eng.traineddata or did you modify it. That file should not have a .js extension because it isn't a JavaScript file. It is a binary blob of neural network weights.

gmlloves commented 1 year ago

Hello, Robert,

The return value of createImageBitmap is a promise. The error is occurring on this line because you are passing a Promise instead of an ImageBitmap to ocrClient.loadImage:

Ups... It was my mistake. It works :)

Is this file exactly the same as https://raw.githubusercontent.com/tesseract-ocr/tessdata_fast/main/eng.traineddata or did you modify it. That file should not have a .js extension because it isn't a JavaScript file. It is a binary blob of neural network weights.

Yes, It is the same file. I just was testing and didn't config the server to return .traineddata files.

Thank you. Gaston