langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.31k stars 2.08k forks source link

UnstructuredLoader Base URL issue #6594

Open ogzhanolguncu opened 3 weeks ago

ogzhanolguncu commented 3 weeks ago

Checked other resources

Example Code

Following code:

const loader = new UnstructuredLoader(filePath, { apiKey: process.env.UNSTRUCTURED_API_KEY });
const loadedData = await loader.load();

Error Message and Stack Trace (if applicable)

Fails with error 401 and message {"detail":"API key is invalid, please provide a valid API key in the header."}

Description

Following the unstructured loader fails due to a baseURL issue. In your codebase langchain/src/document_loaders/fs/unstructured.ts, apiURL is https://api.unstructured.io/general/v0/general, but in the unstructured.io docs, it's https://api.unstructuredapp.io/general/v0/general.

When I do this:

const loader = new UnstructuredLoader(filePath, { 
  apiKey: process.env.UNSTRUCTURED_API_KEY,
  apiUrl: "https://api.unstructuredapp.io" 
});
const loadedData = await loader.load();

It fails with:

error 405 and message {"detail":"Method Not Allowed"}

But, if I simply call their SDK or endpoint, it works.

My LangChain community version is: "@langchain/community": "^0.2.28"

The issue might be something else, because in the code when I replace the apiURL, I expected it to work because it looked okay to me. I'm using Bun.js, by the way, but I don't think that's the cause of the issue.

System Info

platform: macos

node -v v20.12.2

bun -v 1.1.24

dosubot[bot] commented 3 weeks ago

Hey @ogzhanolguncu! I'm here to help you with your LangChain.js issue. I can assist with bug fixes, answer questions, and guide you to become a contributor. Let's tackle this together!

To resolve the UnstructuredLoader Base URL issue in LangChain.js, ensure that you are correctly setting both the apiUrl and apiKey in the UnstructuredLoaderOptions. Here is an example of how to properly configure the UnstructuredLoader:

import { UnstructuredLoader } from "@langchain/community/document_loaders/fs/unstructured";

const loader = new UnstructuredLoader("path/to/your/file.pdf", {
  apiKey: process.env.UNSTRUCTURED_API_KEY,
  apiUrl: "https://api.unstructured.io/general/v0/general", // Ensure this URL is correct
  strategy: "hi_res", // or any other strategy you prefer
  // other options if needed
});

loader.load().then((documents) => {
  console.log(documents);
}).catch((error) => {
  console.error("Error loading documents:", error);
});

Ensure that:

  1. The apiKey is valid and correctly set.
  2. The apiUrl is correctly set to https://api.unstructured.io/general/v0/general.

If you still encounter issues, double-check the API key and URL against the latest documentation from unstructured.io. Additionally, ensure that the file type you are trying to load is supported by the Unstructured API. The supported file types include .txt, .text, .pdf, .docx, .doc, .jpg, .jpeg, .eml, .html, .htm, .md, .pptx, .ppt, .msg, .rtf, .xlsx, .xls, .odt, and .epub [1][2].

To continue talking to Dosu, mention @dosu.