stephanrauh / ngx-extended-pdf-viewer

A full-blown PDF viewer for Angular 16, 17, and beyond
https://pdfviewer.net
Apache License 2.0
489 stars 184 forks source link

Setting up fake worker failed: "Failed to fetch dynamically imported module" #2554

Open Hollow-Ego opened 2 months ago

Hollow-Ego commented 2 months ago

Hi there, hello,

we noticed that sometimes the pdf viewer breaks. We haven't found a consistent way to reproduce the error. Following is what I've learned from my investigations.

This happens in versions 20.5.0 and 21.3.8.

Description

We are using the pdf viewer in a dialog. Opening and closing the dialog multiple times sometimes leads to the pdf viewer breaking. The following error is shown in the console grafik Transcript:

 [GET http://localhost:4200/assets/assets/pdf.worker-4.3.657.min.mjs net::ERR_ABORTED 404 (Not Found)](viewer-4.5.732.min.mjs:23  Failed to load module script: Expected a JavaScript module script but the server responded with a MIME type of "text/html". Strict MIME type checking is enforced for module scripts per HTML spec.)

An error occurred while loading the PDF.

PDF.js v4.5.732 (build: b409d2f8b)
Message: Setting up fake worker failed: "Failed to fetch dynamically imported module: https://pdfviewer.net/assets/assets/pdf.worker-4.5.732.min.mjs".

ERROR Error: Setting up fake worker failed: "Failed to fetch dynamically imported module: https://pdfviewer.net/assets/assets/pdf.worker-4.5.732.min.mjs".

This error occurs rather rarely and can only be solved for the user by completely refreshing the page. However using the debug tools one can force this error. I believe this means, that the cause is not coming from our application. The viewer is destroyed every time the dialog is closed.

General Steps to Reproduce

As said above, there is no reliable way to naturally reproduce this error. Following the steps below one can force this error to occur through the debug tools.

  1. Open the debug tools on the page with a loaded pdf viewer
  2. Go to the "Sources" tab
  3. Open {yourhost}/assets/pdf-{version}.min.js grafik
  4. Set a breakpoint at if (!this.destroyed && t) in the i.on("test", (t => { block grafik
  5. Optionally set another breakpoint at return shadow(this, "_setupFakeWorkerGlobal", (async () => { in static get _setupFakeWorkerGlobal() grafik
  6. Refresh the page. The breakpoint should trigger and pause the execution
  7. Manually set the variable t to ``false grafik
  8. Continue execution until no breakpoints remain
  9. In the console you now see the error grafik

Workarounds

There is no workaround to my knowledge. Edit: You could just copy the pdf.worker files to /assets/assets or add a redirect to your server that resolves /assets/assets to just /assets

Possible cause

For this error to naturally occur I suspect that the worker was not cleaned up entirely or correctly. Maybe the browser was too slow to clean it up before the next time opening the viewer. Again, as this is inconsistent and rarely occurs it is hard to say.

The cause for the error message itself lies in the dynamic import for the pdf.worker file. (await import(this.workerSrc) this.workerSrc is set to './assets/pdf.worker-4.5.732.min.mjs'. However the import function, to my knowledge, tries to resolve this relative to where it is located. Since the viewer file is already located under "/assets" it tries to resolve it to "assets/assets" grafik

Possible solution

From what I've gathered from the source code the workerSrc is `the combination of the assetsFolder and the worker filename.

So a solution could be to either allow the configuration of the workerSrc option, same as with the workerPort or to just drop the usage of the assetsFolder. Given the current documentation, it would probably be preferred to not expose the workerSrc. This could look like this grafik

Since I don't know how this could be setup in a test or if this could have any other impacts that I'm not aware of, I haven't made a PR for this fix. But I would be willing to make one

stephanrauh commented 2 months ago

It seems the relative path doesn't always work in your application. Can you find out why the duplicate assets/ is added?

In the meantime, there's an easy fix. pdfDefaultOptions.workerSrc is meant to be overwritten. Just set

pdfDefaultOptions.workerSrc = `/my-app/assets/pdf.worker-${getVersionSuffix(assetsUrl(pdfDefaultOptions.assetsFolder))}${
          pdfDefaultOptions._internalFilenameSuffix
        }.mjs`;

in the constructor of your component or in your module. If you need to support older browsers as well, modify the lambda accordingly:

pdfDefaultOptions.workerSrc = pdfDefaultOptions.needsES5
      ? `/my-app/assets/pdf.worker-${getVersionSuffix(assetsUrl(pdfDefaultOptions.assetsFolder))}-es5.mjs`
      : `/my-app/assets/pdf.worker-${getVersionSuffix(assetsUrl(pdfDefaultOptions.assetsFolder))}${
          pdfDefaultOptions._internalFilenameSuffix
        }.mjs`

I can't do that myself because some users have different folder layouts.

Hollow-Ego commented 2 months ago

It's not that it doesn't sometimes work in our application, but rather that the relative path to the worker javascript file is not correct.

I added your suggested workaround in the following way

pdfDefaultOptions.workerSrc = () => `./pdf.worker-${pdfjsVersion}${pdfDefaultOptions._internalFilenameSuffix}.mjs`;

Since we don't use the bleeding edge version I skipped the getVersionSuffix function and since the "assets" part leads to the error, I also removed this. However this now leads to one request going to the wrong route, which though doesn't produce any errors. But it tells me, that there is still more to the story.

I'm not sure which part of the documentation points to the fact that pdfDefaultOptions.workerSrc is meant to be overwritten. I understood that you don't have to worry about this, as this is handled by ngx-extended-pdfviewer itself. Maybe that needs some clarification.

I'll try explaining why the URI ends up having "assets/assets". To be more concise I will skip all version numbers in the file names and use pdfviewer.net as a base url

In pdf-min.mjs the function _setupFakeWorkerGlobal tries to dynamically import pdf.min.mjs. The value of this.workerSrc by default is ./assets/pdf.worker.mjs. Because pdf-min.mjs is loaded from pdfviewer.net/assets any dynamic import gets resolved relative to that path. So because pdf-min.mjs is loaded from pdfviewer.net/assets, import(this.workerSrc) is resolved as import('pdfviewer.net/assets/assets/pdf.worker.mjs')

Or in other terms pdfviewer.net/assets + ./assets/pdf.worker.mjs => pdfviewer.net/assets/assets/pdf.worker.mjs

static get _setupFakeWorkerGlobal() {
    return shadow(this, "_setupFakeWorkerGlobal", (async () => {
        if (this.#zi)
            return this.#zi;
        return (await import(this.workerSrc)).WorkerMessageHandler // <= this is the dynamic import
    }
    )())
}

By setting workerSrc to just ./pdf.min.mjs the import now resolves correctly as import('pdfviewer.net/assets/pdf.min.mjs') Or in other terms pdfviewer.net/assets + ./pdf.worker.mjs => pdfviewer.net/assets/pdf.worker.mjs

Hollow-Ego commented 2 months ago

Small update I mentioned above, that with the way I added the workaround I got a failing request for the pdf.worker.mjs. Now I know why.

In viewer.mjs in the _initialize() function a new worker is supposed to be created.

const e = new Worker(this.#Ui(t),{
    type: "module"
})

// Where the function#UI
#Ui(t) {
  return window.trustedTypes ? window.pdfViewerSanitizer.createScriptURL(t) : t
}

So in the t is our workerSrc. Since I have set this to './pdf.worker.mjs' and this is not using a dynamic import, the file will be loaded from pdfviewer.net/pdf.worker.mjs. Since this file also isn't stored there, the request fails with 404. Since there is an error event listener added to the worker service, it gets called and ultimately calls _setupFakeWorker. And within this function we try to use a dynamic import again, which then resolves correctly.

Important For anyone reading this and wanting to use the workaround that I was using: DON'T If for the vast majority of the time your worker works correctly, it's better to make use of it, instead of using the fake worker all the time.

EDIT: I realized I misunderstood the pdfDefaultOptions.workerSrc. My brain was constantly telling me that the "assets" part of the workerSrc directly comes from the assetsUrl. Obviously that is not the case. I will investigate further and see where i was led in the wrong direction and how this could possibly be solved

yoenispantoja commented 2 months ago

Hi guys, I have an issue related to same file "viewer-4.5.732.min.mjs". My server in production does not allow files load with .mjs extensions. Any suggestion for this?

stephanrauh commented 2 months ago

@yoenispantoja Yes. Only, your issue isn't related to the issue of this thread. :) And if you say "my server doesn't allow *.mjs files", you make it sound like you can't change that.

That'd be bad, because then your only choice would be to use an old version of the viewer. I think version 19 was the last version using standard *.js files, but I'm not sure.

I tried to revert the migration to mjs files, but I failed. When the pdf.js team embraced the mjs format, they did so thoroughly. There's no way back.

So you have to configure your server. Basically, all you have to do is telling your server to use the correct MIME type. I've described the configuration for several popular servers here: https://pdfviewer.net/extended-pdf-viewer/troubleshooting

stephanrauh commented 2 months ago

@Hollow-Ego First of all, let me thank you. You've put a lot of effort into describing and analyzing this issue. I don't see this every day. That's just great!

stephanrauh commented 2 months ago

@Hollow-Ego Actually, I suggested (or tried to suggest?) using an absolute path. You know your directory layout, so you can easily use absolute paths. I can't, because every developers uses their own directory layout.

You explanation why there's a duplicate /assets/assets/ path is very convincing - but it can't be the whole story. If it was, my showcase had the same problem. Truth be told, the showcase once did suffer from this problem, but I solved it, and now I wonder why the bug pops up again.

I'm not sure which part of the documentation points to the fact that pdfDefaultOptions.workerSrc is meant to be overwritten.

Did you see https://pdfviewer.net/extended-pdf-viewer/options? If you did, please tell me, because it means I should improve the page. Every option documented over there can be overwritten.

Generally speaking, you've got it right: I tried to make sure that the viewer always uses the correct path automatically. I didn't hear complaints for a long time, and that's why I'm so surprised. Maybe you've hit a rare corner case?

Hollow-Ego commented 2 months ago

@Hollow-Ego First of all, let me thank you. You've put a lot of effort into describing and analyzing this issue. I don't see this every day. That's just great!

@stephanrauh You're welcome. And thank your for the quick replies and overall the fact, that you maintain this library. I know my colleagues really love it too.

Hollow-Ego commented 2 months ago

@Hollow-Ego Actually, I suggested (or tried to suggest?) using an absolute path. You know your directory layout, so you can easily use absolute paths. I can't, because every developers uses their own directory layout.

@stephanrauh Ah sorry I misunderstood that. I ended up experimenting with an absolute path in the end anyway. I was trying to set the assets folders with an absolute path, until I realized that I was heading in the wrong direction. I will try to set an absolute path and see if this works as expected. After that I will post result here.

You explanation why there's a duplicate /assets/assets/ path is very convincing - but it can't be the whole story. If it was, my showcase had the same problem. Truth be told, the showcase once did suffer from this problem, but I solved it, and now I wonder why the bug pops up again.

Actually I do believe that your showcase has the same problem. I noticed a few teams when switching between the different examples, the viewer sometimes wouldn't load. Unfortunately I never looked in the console and right now when I was trying to naturally get to the same state it was all working fine. However, as described above, I was able to reproduce this error on the showcase page, using the debug tools. right until import(this.workerSrc) the workerSrc is set correctly as ./assets/pdf.worker.mjs. Maybe this whole thing can be mitigated by having the default workerSrc always be an absolute path. But this could be a breaking change. Overall, I got to admit that I still don't have the full picture of what option is being used where, how the ./assets path is being added to workerSrc and how it gets to pdf.js.

Did you see https://pdfviewer.net/extended-pdf-viewer/options? If you did, please tell me, because it means I should improve the page. Every option documented over there can be overwritten. I did come across this page a lot. Though, you say "can be overwritten" (optionally), which is different from "meant to be overwritten" (recommended or mandatory). The options page reads as if it is optional to overwrite them, if the need arises. And I think that's what you mean. There are a few things I think can be improved, in my opion.

  • Disable the search hotkeys in the demo PDF file. Users that like to hit Ctrl+F to search for a specific option always land in the PDF search, which on this page is not what the user wants (most likely)
  • The only mention of the workerSrc option is under workerPort. And the text makes it sound like you cannot even overwrite this option. That's what I got from it and why I didn't try until you suggested it
  • For me it took a while to understand where to even overwrite those options. While the TypeScript tab makes this clear, it would be nice to also have one sentence mentioning possible places to overwrite them.
  • And a little side note: I find the left navigation bar a bit too narrow to properly read everything. I also wanted to take a look at it, but I don't know if the code for that is in this repo too or lives somewhere else (Edit: I found it and might have a look at it if time allows)

Generally speaking, you've got it right: I tried to make sure that the viewer always uses the correct path automatically. I didn't hear complaints for a long time, and that's why I'm so surprised. Maybe you've hit a rare corner case?

I was wondering that too and it is likely, that this is more of an edge. But it also likely that during development everything is fine and only the end user notice the file not loading. And a user might just assume the file hasn't loaded correctly and just tries again. In our case we've also had no problems during development and only noticed this in a staging environment. Overall, once I can suggest a possible solution, you can decide if that should be included in the library by default or maybe just put it as a solution on https://pdfviewer.net/extended-pdf-viewer/troubleshooting

Hollow-Ego commented 2 months ago

As promised, here is the update and what seems to work for all cases where we need to load the worker files. Using an absolute URL is the key and probably the cleanest solution. Here is how I implemented it in our case

// In the component where the PDF Viewer is used

// import { DOCUMENT } from '@angular/common';
private document = inject(DOCUMENT);

constructor() {
    pdfDefaultOptions.workerSrc = () => {
      const baseURI = this.document.baseURI;
      const assetsFolder = pdfDefaultOptions.assetsFolder;
      const pdfJsVersion = getVersionSuffix(assetsUrl(assetsFolder));
      const fileEnding = pdfDefaultOptions.needsES5 ? '-es5.mjs' : '.mjs';
      return `${baseURI}${assetsFolder}/pdf.worker-${pdfJsVersion}${fileEnding}`;
    };
  }

I split the declarations up to make it easier to modify the structure of the returned workerSrc URI. This way, if anything changes we don't have to remember to update it for es5 and non-es5 versions. Note though, that we could omit this es5 decision, if we always know exactly if we need it or not. But to have this example work more generally and be close to the current default implementation, I kept it. We inject the Document from angular/common to have a proper (Angular) way of accessing the documents base URI.

I could imagine using this implementation as a default. I don't think it would be a breaking change (I might be wrong though) and is a reasonable default. Then with the addition of the improved documentation this option could be overwritten if needed.

stephanrauh commented 2 months ago

Using the baseURI sounds like a good idea.

stephanrauh commented 2 months ago

Maybe it's also a good idea to check whether the URL of the worker exists. That would allow us to use a different approach to generate the path if the default approach fails, or to detect stuff like the duplicate /assets/assets, and if everything fails, we could generate a good error message.

async function checkUrlExists(url) {
  try {
    const response = await fetch(url, { method: 'HEAD' });
    if (response.ok) { 
      // TODO: check if the MIME type is correct
      console.log('URL exists:', response.status);
      return true;
    } else {
      console.log('URL does not exist:', response.status);
      return false;
    }
  } catch (error) {
    console.log('Error:', error);
    return false;
  }
}

checkUrlExists(pdfDefaultOptions.workerSrc());

And maybe we should load and create the worker in TypeScript. IMHO the current implementation is hard to understand and maintain. Here it is:

    let { workerSrc } = PDFWorker;

    try {
      // Wraps workerSrc path into blob URL, if the former does not belong
      // to the same origin.
      if (
        typeof PDFJSDev !== "undefined" &&
        PDFJSDev.test("GENERIC") &&
        !PDFWorker._isSameOrigin(window.location.href, workerSrc)
      ) {
        workerSrc = PDFWorker._createCDNWrapper(
          new URL(workerSrc, window.location).href
        );
      }

      // modified by ngx-extended-pdf-viewer #1512
      const worker = new Worker(this.#generateTrustedURL(workerSrc), { type: "module" });
      // end of modification by ngx-extended-pdf-viewer #1512

In theory, we could load the worker thread as a blob, but that approach breaks CSP (Content Security Policy).

Hollow-Ego commented 2 months ago

Maybe it's also a good idea to check whether the URL of the worker exists. That would allow us to use a different approach to generate the path if the default approach fails, or to detect stuff like the duplicate /assets/assets, and if everything fails, we could generate a good error message.

Agreed. Having a fallback and a worse case error message is more safe. Though I don't think we would be able to check if the path contains duplicate assets/assets and even if it does, it might be intentional. So I think checking if the path exists makes sense and adding a fallback path too. But as for the duplicates we could change the workerSrc to either be a full path or a path relative to the assets folder. Then the new Worker() call would use the combination of assets folder and workerSrc while the import() for the fake worker would only use the workerSrc.

And maybe we should load and create the worker in TypeScript. IMHO the current implementation is hard to understand and maintain.

I mean it's fine, but if you have to change it often, then it makes sense to improve it

linhhn-familysoft commented 2 weeks ago

@stephanrauh Is this bug being fixed and in what version is it expected to be applied?

stephanrauh commented 2 weeks ago

@linhhn-familysoft Unfortunately, this ticket got lost in the big pile of new tickets. However, I suspected this ticket is mostly solved. I just wanted to check if using the baseURI is a good idea or not.

However, you wouldn't ask if everything worked fine. Can you tell me something about your bug?