langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.32k stars 2.08k forks source link

Error when using the PPTX loader #4673

Closed ajoublin closed 6 months ago

ajoublin commented 6 months ago

Checked other resources

Example Code

import { PPTXLoader } from "langchain/document_loaders/fs/pptx";

const buffer = Buffer //TODO : Get from an input file upload via POST API
const blobBuffer = new Blob([buffer])
const loader = new PPTXLoader(blobBuffer );
const docs = await loader.load();

Error Message and Stack Trace (if applicable)

Collecting page data  ...Error: ENOENT: no such file or directory, open 'C:\Users\XXXXXXXX\test\data\05-versions-space.pdf'
    at Object.openSync (node:fs:581:18)
    at Object.readFileSync (node:fs:457:35)
    at 59546 (C:\Users\XXXXXXXX\.next\server\app\api\upload\route.js:48:33102)
    at t (C:\Users\XXXXXXXX\.next\server\webpack-runtime.js:1:143)
    at 41793 (C:\Users\XXXXXXXX\.next\server\app\api\upload\route.js:48:17089)
    at t (C:\Users\XXXXXXXX\.next\server\webpack-runtime.js:1:143)
    at 60142 (C:\Users\XXXXXXXX\.next\server\app\api\upload\route.js:21:659)
    at t (C:\Users\XXXXXXXX\.next\server\webpack-runtime.js:1:143)
    at __webpack_exec__ (C:\Users\XXXXXXXX\.next\server\app\api\upload\route.js:106:467703)
    at C:\Users\XXXXXXXX\.next\server\app\api\upload\route.js:106:467802 {
  errno: -4058,
  code: 'ENOENT',
  syscall: 'open',
  path: 'C:\\Users\\XXXXXXXX\\test\\data\\05-versions-space.pdf'
}

Description

I'm trying to use the langchain PPTX loader to create documents from a Powerpoint file via a POST call. I'm expected to retrieve docs with the load() function. I have a 404 error when trying to reach my route and I've got the error in the description when compiling my NextJS project.

System Info

langchain@0.1.25 | MIT | deps: 18 | versions: 255
Typescript bindings for langchain
https://github.com/langchain-ai/langchainjs/tree/main/langchain/

keywords: llm, ai, gpt3, chain, prompt, prompt engineering, chatgpt, machine learning, ml, openai, embeddings, vectorstores

dist
.tarball: https://registry.npmjs.org/langchain/-/langchain-0.1.25.tgz
.shasum: afd0faf2eadfb5e6e1d57bb3bbc5731918e62135
.integrity: sha512-sfEChvr4H2CklHdSByNBbytwBrFhgtA5kPOnwcBrxuXGg1iOaTzhVxQA0QcNcQucI3hZrsNbZjxGp+Can1ooZQ==
.unpackedSize: 4.0 MB

dependencies:
@anthropic-ai/sdk: ^0.9.1     @langchain/openai: ~0.0.14    js-tiktoken: ^1.0.7           langchainhub: ~0.0.8          openapi-types: ^12.1.3        yaml: ^2.2.1
@langchain/community: ~0.0.33 binary-extensions: ^2.2.0     js-yaml: ^4.1.0               langsmith: ~0.1.7             p-retry: 4                    zod-to-json-schema: ^3.22.3
@langchain/core: ~0.1.36      expr-eval: ^2.0.2             jsonpointer: ^5.0.1           ml-distance: ^4.0.0           uuid: ^9.0.0                  zod: ^3.22.4

maintainers:
- nfcampos <nuno@boringbits.io>
- jacoblee93 <jacoblee93@gmail.com>
- davidduong <david@duong.cz>
- hwchase17 <hw.chase.17@gmail.com>
- basproul <braceasproul@gmail.com>

dist-tags:
latest: 0.1.25     next: 0.1.19-rc.2

published a week ago by basproul <braceasproul@gmail.com>

Code executed on local environment and on a live version deployed on Vercel. Node version : v20.10.0

dosubot[bot] commented 6 months ago

Hey @ajoublin, good to see you diving into new challenges! 🚀

Based on the error message you provided, it seems like the issue is related to the file path you're providing to the PPTXLoader. The error ENOENT: no such file or directory typically indicates that the file you're trying to access doesn't exist at the specified path.

In your code, you're creating a Blob from a Buffer and passing it to the PPTXLoader. However, the PPTXLoader expects a file path as an argument, not a Blob or Buffer. This could be the reason why you're encountering the error.

Here's how you should use the PPTXLoader:

import { PPTXLoader } from "langchain/document_loaders/fs/pptx";

const filePath = 'path/to/your/file.pptx'; // Replace with your file path
const loader = new PPTXLoader(filePath);
const docs = await loader.load();

Ensure that the file path you provide exists and is accessible. If you're trying to load the file from a POST request, you might need to save the file to your server first, and then pass the file path to the PPTXLoader.

Regarding the 404 error, it's likely related to your NextJS routing configuration rather than the LangChain JS library. Make sure that the route you're trying to access is correctly defined in your NextJS project.

If you continue to encounter issues, please provide more details about your implementation, specifically how you're handling the file upload in your POST API and how you're passing the file to the PPTXLoader.

For more information on how to use the PPTXLoader, you can refer to the LangChain JS documentation and the example code in the repository.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
ajoublin commented 6 months ago

Regarding the documentation provided by Langchain :

export class PPTXLoader extends BufferLoader {
  constructor(filePathOrBlob: string | Blob) {
    super(filePathOrBlob);
  }
}

The PPTXLoader constructor accept a blob as an argument (which is working fine for TextLoader / CSVLoader / DocxLoader / PDFLoader by the way).

About the 404 error, this error only shows up when I add this piece of code : loader = new PPTXLoader(blobBuffer) so I'm 100% sure it is not related to my NextJS configuration.

jacoblee93 commented 6 months ago

It's Node only: https://js.langchain.com/docs/integrations/document_loaders/file_loaders/pptx

This is going to sound really dumb, but you could try just adding a file at that path with that exact name at your project root:

https://github.com/jacoblee93/dlai-langchainjs/tree/main/test/data

That works for the PDF loader in some environments sometimes, and I am assuming that pdf-parse is an indirect dep of the PPTX loader.

hzeyuan commented 5 months ago

Regarding the documentation provided by Langchain :

export class PPTXLoader extends BufferLoader {
  constructor(filePathOrBlob: string | Blob) {
    super(filePathOrBlob);
  }
}

The PPTXLoader constructor accept a blob as an argument (which is working fine for TextLoader / CSVLoader / DocxLoader / PDFLoader by the way).

About the 404 error, this error only shows up when I add this piece of code : loader = new PPTXLoader(blobBuffer) so I'm 100% sure it is not related to my NextJS configuration.

i have the same problem