langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.29k stars 2.08k forks source link

How to use external links to load PDF files in nodejs, similar function OnlinePDFLoader is used in Python docs #3787

Closed araza110 closed 8 months ago

araza110 commented 8 months ago

Load PDF from an external link

I want to use the PDFLoader() function, but instead of passing a file path, I want to pass an external PDF link, such as a link that is uploaded on S3. In Python, I have seen a function: loader = OnlinePDFLoader("https://arxiv.org/pdf/2302.03803.pdf"). However, I can't find this function in the JS/TS documentation. Is there an alternative way to achieve this?

dosubot[bot] commented 8 months ago

🤖

Hello @araza110 :wave:,

I'm Dosu, a friendly bot here to help you out while we're waiting for a human maintainer! I'm here to assist you in solving bugs, answer your questions, and guide you in contributing to our repository. Let's get started, shall we?

Yes, you can load a PDF from an external link in JavaScript or TypeScript using the WebPDFLoader() function from the langchainjs library. However, you need to fetch the PDF file from the external link first and convert it into a Blob object. Here's how you can do it:

import { WebPDFLoader } from "langchain/document_loaders/web/pdf";

async function loadPDF(url) {
    const response = await fetch(url);
    const data = await response.blob();
    const loader = new WebPDFLoader(data);
    const docs = await loader.load();
    console.log({ docs });
}

loadPDF("https://arxiv.org/pdf/2302.03803.pdf");

In this code, the fetch() function is used to get the PDF file from the external link. The response.blob() function is then used to convert the fetched data into a Blob object. This Blob object is then passed to the WebPDFLoader() function to load the PDF file.

You can find more information about this in the examples/src/document_loaders/web_pdf.ts file in the langchainjs repository.

Please let me know if you have any other questions or need further clarification.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jacoblee93 commented 8 months ago

@dosubot is the approach I'd say - we could build it in but will close this for now.

dosubot[bot] commented 8 months ago

🤖

It sounds like you're discussing a potential approach for a project. If you have any questions or need assistance with building or integrating language models using the langchainjs framework, feel free to ask! I'm here to help.


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

sohamsal commented 8 months ago

Not sure if I'm doing something wrong but I keep getting this error with the code above image