langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.71k stars 2.19k forks source link

Error: No "GlobalWorkerOptions.workerSrc" specified. #1296

Closed geekroscom closed 1 year ago

geekroscom commented 1 year ago

LangChain Version:0.0.75 Development Environment:Vue3+Vite+Ts+Electron

My usage process is as follows:

yarn add pdf-parse && yarn add pdfjs-dist
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
const loader = new PDFLoader("D:\\202305\\Chat_Desktop\\example_data\\Analysis-and-Comparison-between-Optimism-and-StarkNet.pdf", {
    pdfjs: () => import("pdfjs-dist/legacy/build/pdf.js"),
});
docs = await loader.load();

Error:

Uncaught (in promise) Error: No "GlobalWorkerOptions.workerSrc" specified.
at get workerSrc [as workerSrc] (api.js:2304:11)
at _PDFWorker._initialize (api.js:2118:13)
at new _PDFWorker (api.js:2067:10)
at getDocument (api.js:384:9)
at PDFLoader.parse (D:\202305\Chat_Desktop\release\dist\preload\index.cjs:2903:27)
at async onImport (dialog.vue:161:36)
geekroscom commented 1 year ago
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
const loader = new PDFLoader("D:\\202305\\Chat_Desktop\\example_data\\Analysis-and-Comparison-between-Optimism-and-StarkNet.pdf", {
    pdfjs: () => import("pdfjs-dist/legacy/build/pdf.worker.js"),
});
docs = await loader.load();

Error reporting after using pdf.worker.js:

Uncaught (in promise) TypeError: getDocument is not a function
at PDFLoader.parse (D:\202305\Chat_Desktop\release\dist\preload\index.cjs:2924:27)
at async onImport (dialog.vue:159:36)
parse @ D:\202305\Chat_Desktop\release\dist\preload\index.cjs:2924
Promise.catch (async)
callWithAsyncErrorHandling @ runtime-core.esm-bundler.js:184
emit @ runtime-core.esm-bundler.js:730
(anonymous) @ runtime-core.esm-bundler.js:7465
handleClick @ button.vue:116
callWithErrorHandling @ runtime-core.esm-bundler.js:173
callWithAsyncErrorHandling @ runtime-core.esm-bundler.js:182
invoker @ runtime-dom.esm-bundler.js:345
geekroscom commented 1 year ago

I have resolved it for now, but I'm not sure about the correct method for releasing it.

yarn add pdf-parse && yarn add pdfjs-dist
import * as PDFLib from "pdfjs-dist/legacy/build/pdf.js";
const loader = new props.app.api.LangChain.PDFLoader(props.app.page.database.dialog.form.file_value, {
pdfjs: ()=>{ 
    PDFLib.GlobalWorkerOptions.workerSrc = "https://cdn.*****.com/****/pdf.worker.min.js";
    return PDFLib;
},
});
 docs = await loader.load();

The pdf.worker.min.js file is obtained from the pdfjs-dist/legacy/build/ directory, uploaded to your own server, and then the PDFLib.GlobalWorkerOptions.workerSrc is set to that file.