mozilla / pdf.js

PDF Reader in JavaScript
https://mozilla.github.io/pdf.js/
Apache License 2.0
48.63k stars 10k forks source link

No "GlobalWorkerOptions.workerSrc" specified. #10478

Closed qinyuhua closed 5 years ago

qinyuhua commented 5 years ago

if (!fallbackWorkerSrc && typeof document !== 'undefined') { var pdfjsFilePath = document.currentScript && document.currentScript.src; if (pdfjsFilePath) { fallbackWorkerSrc = pdfjsFilePath.replace(/(.(?:min.)?js)(\?.*)?$/i, '.worker$1$2'); } } Sometimes “document.currentScript” === null, pdfjsFilePath === null,

function getWorkerSrc() { if (_worker_options.GlobalWorkerOptions.workerSrc) { return _worker_options.GlobalWorkerOptions.workerSrc; } if (typeof fallbackWorkerSrc !== 'undefined') { return fallbackWorkerSrc; } throw new Error('No "GlobalWorkerOptions.workerSrc" specified.'); } image

Snuffleupagus commented 5 years ago

You should always specify the workerSrc explicitly, i.e. by setting pdfjsLib.GlobalWorkerOptions.workerSrc before calling pdfjsLib.getDocument, since the fallback is only a best effort solution which is not guaranteed to work correctly in every situation.

luistak commented 5 years ago

You should try this:

  const pdfjs = await import('pdfjs-dist/build/pdf');
  const pdfjsWorker = await import('pdfjs-dist/build/pdf.worker.entry');

  pdfjs.GlobalWorkerOptions.workerSrc = pdfjsWorker;

  ...
ym78900 commented 4 years ago

You should try this:

  const pdfjs = await import('pdfjs-dist/build/pdf');
  const pdfjsWorker = await import('pdfjs-dist/build/pdf.worker.entry');

  pdfjs.GlobalWorkerOptions.workerSrc = pdfjsWorker;

  ...

I have had difficulty using the idea you gave in react the problem is it doesnt work when the component is mounted when I use the official script everything is fine but it doesnt work with pdfjs-dist

const pdfjsLib = window['pdfjs-dist/build/pdf
pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';

any idea for that? I would prefer not to use the script

luistak commented 4 years ago

Sincerely I would use a react abstraction over pdf's since pdf.js officially doesn't support react :(

joseDaKing commented 3 years ago

Please help, I still get this error when I use the library with svelte-kit

ym78900 commented 3 years ago

Please help, I still get this error when I use the library with svelte-kit

What I did was, to append the script at the very beginning when my component is mounted. I guess for your case you need to use the script along side with the other ones you're using.

https://mozilla.github.io/pdf.js/build/pdf.js

when you use this, the GlobalWorkerOptions would become available.

I hope it helps, although I used it in React.

joseDaKing commented 3 years ago

Thanks for the information, so I cannot just import it like a regular package?

ym78900 commented 3 years ago

Thanks for the information, so I cannot just import it like a regular package?

no unfortunately, I tried it that way, it didnt work

joseDaKing commented 3 years ago

okey I think I know how to fixed it know

sidd98 commented 3 years ago

I have been stuck with this problem using ver 2.4.456 of pdfjs-dist, checked the library webpack file at root So done this to fix in React component - import PdfjsWorker from "pdfjs-dist/build/pdf.worker.js"; import PDFJS, { getDocument } from "pdfjs-dist"; PDFJS.workerSrc = "pdfjs-dist/build/pdf.worker.js"; PDFJS.GlobalWorkerOptions.workerSrc = "pdfjs-dist/build/pdf.worker.js"; PDFJS.GlobalWorkerOptions.workerPort = new PdfjsWorker();

iamrmin commented 2 years ago

has anyone faced error FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory when using

import { pdfjs } from 'react-pdf'
import pdfjsWorker from 'pdfjs-dist/build/pdf.worker.entry'

pdfjs.GlobalWorkerOptions.workerSrc = pdfjsWorker

react-scripts build and react-scripts start just crashes when error mentioned above. most importantly it only occur when i enable source map.

kvengl commented 2 years ago

has anyone faced error FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory when using

import { pdfjs } from 'react-pdf'
import pdfjsWorker from 'pdfjs-dist/build/pdf.worker.entry'

pdfjs.GlobalWorkerOptions.workerSrc = pdfjsWorker

react-scripts build and react-scripts start just crashes when error mentioned above. most importantly it only occur when i enable source map.

I fixed it as follows:

1) npm run eject 2) open {project name folder}/config/webpack.config.js 3) Added the exception/node_modules\/pdfjs-dist/ here:

module: {
      strictExportPresence: true,
      rules: [
        // Handle node_modules packages that contain sourcemaps
        shouldUseSourceMap && {
          enforce: 'pre',
          exclude: [/@babel(?:\/|\\{1,2})runtime/, /node_modules\/pdfjs-dist/],
          test: /\.(js|mjs|jsx|ts|tsx|css)$/,
          loader: require.resolve('source-map-loader'),
        },
...
iamrmin commented 2 years ago

@kvengl npm run eject is dangerous imo. it removes the react-script wrapper and now you are on your own if something goes wrong. e.g. dependency conflicts. react-scripts is maintain by a big community and you can trust new versions. I've always run into the issues when dealing directly with webpack.

I have fixed it with simple solution. you can check https://github.com/mozilla/pdf.js/issues/8305

ArslanAmeer commented 1 year ago

People coming for ANGULAR fix:

In imports:

import * as PDFJS from 'pdfjs-dist';
// @ts-ignore
import pdfjsWorker from 'pdfjs-dist/build/pdf.worker.entry'
PDFJS.GlobalWorkerOptions.workerSrc = pdfjsWorker;

and at the loading pdf end:

  private async loadPDF() {
    const pdf = await PDFJS.getDocument(this.document.file).promise;
    const page = await pdf.getPage(1);
    const viewport = page.getViewport({ scale: 1 });
    this.canvas.nativeElement.width = viewport.width;
    this.canvas.nativeElement.height = viewport.height;
    this.context = this.canvas.nativeElement.getContext('2d')!;
    const renderContext = {
      canvasContext: this.context,
      viewport: viewport
    };
    await page.render(renderContext).promise;
    this.isLoading = false;
  }

This will resolve your issue right away, but now you may face Warning: Setting up fake worker. console warning.

to resolve this update your imports for PDFjs to just:

  import * as PDFJS from 'pdfjs-dist';
PDFJS.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${PDFJS.version}/pdf.worker.js`;

Angular: v15.1.0 pdfjs-dist: v3.3.122

z0d14c commented 1 year ago

For me, what seemed to work was this:

import "pdfjs-dist/build/pdf.worker.entry";

It appears that this code will attach the pdfJsworker to the window object that I presume is what gets fallen back on when you run getDocument. No shade, but I'm very curious about the design decision or technical constraint behind needing to do this.

trandaison commented 1 year ago

@z0d14c was right. Take a look file pdfjs-dist/build/pdf.worker.entry.js

(typeof window !== "undefined"
  ? window
  : {}
).pdfjsWorker = require("./pdf.worker.js");

the worker already assigned to window.pdfjsWorker. You only need an import statement, no need any assign statement.

import 'pdfjs-dist/build/pdf.worker.entry';
break69 commented 1 year ago

Everything went back to normal by doing it this way for me. Thanks to luistak who helped me a lot

import * as pdfjsLib from "pdfjs-dist";
// Import the worker correctly to avoid the message "Warning: Setting up fake worker"
import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry"; 

pdfjsLib.GlobalWorkerOptions.workerSrc = pdfjsWorker;

the "Warning: Setting up fake worker" message came back, but I'll leave the idea in case it helps.

charkhaniakash commented 1 year ago

// It will work Now use this

import { pdfjs } from 'react-pdf';

pdfjs.GlobalWorkerOptions.workerSrc = //unpkg.com/pdfjs-dist@${pdfjs.version}/build/pdf.worker.min.js;

dr1602 commented 1 year ago

For me, what seemed to work was this:

import "pdfjs-dist/build/pdf.worker.entry";

It appears that this code will attach the pdfJsworker to the window object that I presume is what gets fallen back on when you run getDocument. No shade, but I'm very curious about the design decision or technical constraint behind needing to do this.

Amazing, it worked for me on a ReactJS & TypeScript project!

daveleee commented 1 year ago

It's specified in the document. https://github.com/wojtekmaj/react-pdf#configure-pdfjs-worker

juansebastianl commented 1 year ago

I came across this issue while trying to use pdfJS in a svelte project and I had to do something slightly different:

import * as pdfjsLib from 'pdfjs-dist/build/pdf';

// this import is needed in to configure a default worker for pdfjs
import * as pdfjsWorker from "pdfjs-dist/build/pdf.worker.mjs"; 
pdfjsLib.GlobalWorkerOptions.workerSrc = pdfjsWorker;

Which got past the initial error but still gets the setting up fake worker warning, however it actual functionality seems to be working fine.

Rajat16nov commented 11 months ago

I came across this issue while trying to use pdfJS in a svelte project and I had to do something slightly different:

import * as pdfjsLib from 'pdfjs-dist/build/pdf';

// this import is needed in to configure a default worker for pdfjs
import * as pdfjsWorker from "pdfjs-dist/build/pdf.worker.mjs"; 
pdfjsLib.GlobalWorkerOptions.workerSrc = pdfjsWorker;

Which got past the initial error but still gets the setting up fake worker warning, however it actual functionality seems to be working fine.

Thanks a lot! This is the only solution that worked for me on React.

JHarrisGTI commented 10 months ago

For me, what seemed to work was this:

import "pdfjs-dist/build/pdf.worker.entry";

This got my Angular app working.

When I upgraded to Angular v17 and pdf.js v4, I had to change it to:

import "pdfjs-dist/build/pdf.worker.mjs";

cosimopolito commented 10 months ago

Has anyone figured out this error with pdf dist and angular 17
./node_modules/pdfjs-dist/build/pdf.mjs - Error: Module parse failed: The top-level-await experiment is not enabled (set experiments.topLevelAwait: true to enabled it) File was processed with these loaders:

Soviut commented 10 months ago

The solution from react-pdf the docs works and gets rid of the setting up fake worker warning.

import * as pdfjs from 'pdfjs-dist'

pdfjs.GlobalWorkerOptions.workerSrc = new URL(
  'pdfjs-dist/build/pdf.worker.min.mjs',
  import.meta.url
).toString()

However, I could see this encountering build issues unless you include the minified file in your build pipeline.

qstiegler commented 10 months ago

Found a solution with Angular wich also resolves the setting up fake worker warning:

  1. add following to the assets area in the angular.json:
{
    "glob": "pdf.worker.min.mjs",
    "input": "./node_modules/pdfjs-dist/build",
    "output": "./assets"
}

Then somewhere globally in your code (I do it in app.module.ts), add the following lines:

import * as pdfjsDist from 'pdfjs-dist';

pdfjsDist.GlobalWorkerOptions.workerSrc = 'assets/pdf.worker.min.mjs';

Now the worker runs smoothly and no warning appears without using a CDN 🎉

Shanwer commented 8 months ago

Found a solution with Angular wich also resolves the setting up fake worker warning:

  1. add following to the assets area in the angular.json:
{
    "glob": "pdf.worker.min.mjs",
    "input": "./node_modules/pdfjs-dist/build",
    "output": "./assets"
}

Then somewhere globally in your code (I do it in app.module.ts), add the following lines:

import * as pdfjsDist from 'pdfjs-dist';

pdfjsDist.GlobalWorkerOptions.workerSrc = 'assets/pdf.worker.min.mjs';

Now the worker runs smoothly and no warning appears without using a CDN 🎉

This works on yarn, thank you!

yulluone commented 5 months ago

For me, what seemed to work was this:

import "pdfjs-dist/build/pdf.worker.entry";

It appears that this code will attach the pdfJsworker to the window object that I presume is what gets fallen back on when you run getDocument. No shade, but I'm very curious about the design decision or technical constraint behind needing to do this.

Thankyou @z0d14c, this worked for me. It does attach pdfJsWorker to the window. I'm also very interested in why this works.

For Those coming from Nuxt or looking to extract text from a pdf, here is my utility function for that

import "pdfjs-dist/build/pdf.worker.mjs";

// Function to extract text from a PDF file
export async function extractTextFromPDF(file: File): Promise<string[]> {
  // dynamic import to avoid ssr issues
  const { getDocument } = await import("pdfjs-dist");

  // Read the file as an array buffer
  const arrayBuffer = await file.arrayBuffer();

  // Load the PDF document
  const pdfDocument = await getDocument({ data: arrayBuffer }).promise;

  const pagesText: string[] = [];

  // Loop through each page
  for (let i = 0; i < pdfDocument.numPages; i++) {
    const page = await pdfDocument.getPage(i + 1); // Pages are 1-based in pdfjs

    // Get the text content of the page
    const textContent = await page.getTextContent();

    // Extract and join the text items into a single string
    const pageText = textContent.items
      .map((item) => (item as any).str)
      .join(" ");

    pagesText.push(pageText);
  }

  return pagesText;
}
alvaroman23 commented 5 months ago

Anyone Knows how to fix this with Vue3? Please I've been stuck on this for 1 week! It's so rare It works for me on my localhost, but when I deploy it into Amplify and use it throught the domain url got this error back

index-DGRtMnYU.js:202 Error loading PDF: Error: Setting up fake worker failed: "Failed to fetch dynamically imported module: https://test/node_modules/pdfjs-dist/build/pdf.worker.mjs".
    at index-DGRtMnYU.js:192:171740
Soviut commented 5 months ago

@alvaroman23 It's telling you what the problem is. It's trying to import a module from https://test/ which doesn't exist.

Have a look at this thread. https://github.com/vitejs/vite/issues/11804

In one case it turned out to be an ad blocker that was preventing an import. However, in your case it looks like it's assuming that your path the node module is a URL.

Also, look further up in this thread, because I posted a solution that I was using with Vue 3 + Vite.

lakshminarayanan002 commented 4 months ago

Anyone Knows how to fix this with Vue3? Please I've been stuck on this for 1 week! It's so rare It works for me on my localhost, but when I deploy it into Amplify and use it throught the domain url got this error back

index-DGRtMnYU.js:202 Error loading PDF: Error: Setting up fake worker failed: "Failed to fetch dynamically imported module: https://test/node_modules/pdfjs-dist/build/pdf.worker.mjs".
    at index-DGRtMnYU.js:192:171740

Same error here and searching for fix.

ngocxxu commented 4 months ago
import "pdfjs-dist/build/pdf.worker.mjs";

So thank you, I apply this import in my reactjs project and it's successful :D

iamandrewluca commented 3 months ago

At the writing of this, I think the right solution for integrating pdfjs-dist with Vite, is this one:

"vite": "^5.3.4",
"pdfjs-dist": "^4.4.168",
import workerSrc from 'pdfjs-dist/build/pdf.worker?worker&url'
import * as pdfjs from 'pdfjs-dist'

pdfjs.GlobalWorkerOptions.workerSrc = workerSrc

pdfjs.getDocument('/bitcoin.pdf').promise.then((pdf) => {
    console.log(pdf);
})

https://vitejs.dev/guide/assets.html#importing-script-as-a-worker

Webpack should be able to also do something like this https://webpack.js.org/guides/web-workers/

johannesmutter commented 2 weeks ago

For Vite + Sveltekit

import * as pdfjs from 'pdfjs-dist/build/pdf'
import 'pdfjs-dist/build/pdf.worker.entry'

if (typeof window !== 'undefined' && 'Worker' in window) {
  pdfjs.GlobalWorkerOptions.workerSrc = new URL(
    'pdfjs-dist/build/pdf.worker.min.js',
    import.meta.url
  ).toString()
}
PhilipJovanovic00 commented 1 week ago

For plain JS, I had to use the following code:

import * as pdfjsworker from "pdfjs-dist/build/pdf.worker.mjs"
import * as pdfjs from "pdfjs-dist/build/pdf.min.mjs"

pdfjs.workerSrc = pdfjsworker;
let loadingTasks = pdfjs.getDocument(fileUrl);
let fileCheckPromise = await loadingTasks.promise;

This gave me an Invalid 'workerSrc' type error:

pdfjs.GlobalWorkerOptions.workerSrc = workerSrc

https://stackoverflow.com/a/26291032