wojtekmaj / react-pdf

Display PDFs in your React app as easily as if they were images.
https://projects.wojtekmaj.pl/react-pdf
MIT License
8.97k stars 861 forks source link

pdfjs crashes on getDocument if worker is set using pdfjs.GlobalWorkerOptions.workerPort and second file is rendered #1838

Open jkgenser opened 1 week ago

jkgenser commented 1 week ago

Before you start - checklist

Description

I am setting up some defensive programming where I am loading my worker code in the background as my app starts. One of the issues with underlying pdfjs is that if the worker fetch fails, then the app is not recoverable.

The pdfjs maintainer recommend doing this in order to be robust to failure: https://github.com/mozilla/pdf.js/issues/14332#issuecomment-984764484

Here is a subset of my code. Note I use the blob approach to get around CORS as the worker file is being loaded from a different URL. However, whether you use blob or not to instantiate the worker doesn't make a difference.

export async function fetchWithRetry(
  url: string,
  retries: number = 2,
): Promise<void> {
  const attemptFetch = async (attempt: number): Promise<void> => {
    try {
      const response = await fetch(url);
      if (!response.ok) {
        throw new Error(`HTTP error! status: ${response.status}`);
      }
      const workerScript = await response.text();
      const blob = new Blob([workerScript], { type: 'application/javascript' });
      const workerUrlBlob = URL.createObjectURL(blob);
      pdfjs.GlobalWorkerOptions.workerPort = new Worker(workerUrlBlob, {
        type: 'module',
      });
      // pdfjs.GlobalWorkerOptions.workerSrc = workerUrlBlob;
    } catch (error) {
      if (attempt < retries) {
        console.log(`Retrying fetch... Attempt ${attempt + 1}`);
        await attemptFetch(attempt + 1);
      } else {
        console.error('Error loading PDF worker script', error);
        pdfjs.GlobalWorkerOptions.workerSrc = url;
      }
    }
  };

  await attemptFetch(0);
}

Steps to reproduce

  1. Configure the pdfjs.GlobalWorkerOptions.workerPort using a web worker using the snipper below.
  2. Render a PDF using file_a
  3. Pass in file_b to the component that renders the document
pdfjs.GlobalWorkerOptions.workerPort = new Worker(workerUrlBlob, {
        type: 'module',
      });

Expected behavior

PDF renders file_b

Actual behavior

Application crashes. Here is the relevant part of the stack trace:

Error: PDFWorker.fromPort - the worker is being destroyed.
Please remember to await `PDFDocumentLoadingTask.destroy()`-calls.
    at _PDFWorker.fromPort (api.js:2299:15)
    at getDocument (api.js:355:19)
    at loadDocument (Document.js:238:15)

Additional information

I suspect this may be related to the following issue raised in pdfjs

https://github.com/mozilla/pdf.js/issues/16777

Also here is the relevant line in pdfjs that is addressing that issue:

https://github.com/mozilla/pdf.js/pull/16830/files#diff-082d6b37ad01db7ac97cc07c6ddb0dc52040484c5ef91b110b072f50144d9f39R2305

Long story short, I believe it is related to not awaiting destroy if workerPort is used.

Should we assume workerPort is not supported by this lib since it results in a crash if a different file is passed in while a worker is already running.

Environment