langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.72k stars 2.19k forks source link

EPubLoader's peer dependency epub2/zipfile doesn't support node >= 12 #4937

Closed moritzWa closed 2 months ago

moritzWa commented 7 months ago

Checked other resources

Example Code

try {
    const response = await fetch(file.url)
    let pageLevelDocs
    if (file.name.endsWith('.epub')) {
      const loader = new EPubLoader(file.url)
      pageLevelDocs = await loader.load()
    } else {
      const blob = await response.blob()
      const loader = new PDFLoader(blob)
      pageLevelDocs = await loader.load()
    }
} catch (error) {
    console.error(error)
}

The full code of this project can be found at https://github.com/moritzWa/ai-quote-finder/

Error Message and Stack Trace (if applicable)

next build:

Module not found: Can't resolve 'zipfile' in '/Users/m/Documents/ai-quote-finder/node_modules/epub2'
Did you mean './zipfile'?
Requests that should resolve in the current directory need to start with './'.
Requests that start with a name are treated as module requests and resolve within module directories (node_modules, /Users/m/Documents/ai-quote-finder).
If changing the source code is not an option there is also a resolve options called 'preferRelative' which tries to resolve these kind of requests in the current directory too.

Import trace for requested module:
./node_modules/epub2/zipfile.js
./node_modules/epub2/lib/epub.js
./node_modules/epub2/index.js
./node_modules/langchain/dist/document_loaders/fs/epub.js
./node_modules/langchain/document_loaders/fs/epub.js
./src/app/api/uploadthing/core.ts
./src/app/api/uploadthing/route.ts

production error:

err in onUploadComplete catch. err: Error: Invalid/missing file https://utfs.io/f/840c704f-c2ff-42ef-af74-ab34df80e14b-eev9ch.0.0.epub
    at EPub.open (webpack-internal:///(rsc)/./node_modules/epub2/lib/epub.js:92:32)
    at EPub.parse (webpack-internal:///(rsc)/./node_modules/epub2/lib/epub.js:79:14)
    at eval (webpack-internal:///(rsc)/./node_modules/epub2/index.js:38:18)
    at EPub.createAsync (webpack-internal:///(rsc)/./node_modules/epub2/index.js:23:16)
    at EPubLoader.load (webpack-internal:///(rsc)/./node_modules/langchain/dist/document_loaders/fs/epub.js:61:33)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.onUploadComplete [as resolver] (webpack-internal:///(rsc)/./src/app/api/uploadthing/core.ts:102:29)
    at async eval (webpack-internal:///(rsc)/./node_modules/uploadthing/server/index.js:355:25)
    at async Object.POST (webpack-internal:///(rsc)/./node_modules/uploadthing/server/index.js:1237:26)
    at async /Users/m/Documents/ai-quote-finder/node_modules/next/dist/compiled/next-server/app-route.runtime.dev.js:6:62609 {
  epub: EPub {
    _events: [Object: null prototype] {
      error: [Function: cb_err],
      end: [Function (anonymous)]
    },
    _eventsCount: 2,
    _maxListeners: undefined,
    filename: 'https://utfs.io/f/840c704f-c2ff-42ef-af74-ab34df80e14b-eev9ch.0.0.epub',
    imageroot: '/images/',
    linkroot: '/links/',
    containerFile: null,
    mimeFile: null,
    rootFile: null,
    metadata: {},
    manifest: {},
    spine: { toc: null, contents: [] },
    flow: [],
    toc: [],
    [Symbol(kCapture)]: false
  }
}
...
./node_modules/epub2/zipfile.js
Module not found: Can't resolve 'zipfile' in '/Users/m/Documents/ai-quote-finder/node_modules/epub2'
Did you mean './zipfile'?
Requests that should resolve in the current directory need to start with './'.
Requests that start with a name are treated as module requests and resolve within module directories (node_modules, /Users/m/Documents/ai-quote-finder).
If changing the source code is not an option there is also a resolve options called 'preferRelative' which tries to resolve these kind of requests in the current directory too.

Import trace for requested module:
./node_modules/epub2/zipfile.js
./node_modules/epub2/lib/epub.js
./node_modules/epub2/index.js
./node_modules/langchain/dist/document_loaders/fs/epub.js
./node_modules/langchain/document_loaders/fs/epub.js
./src/app/api/uploadthing/core.ts

Description

Trying to use EPubLoader(file.url); throws an error.

The used dependency epub2 seems to rely on zipfile which according to this ticket and my errors doesn't support node version of 12 or higher. See (node-zipfile issue).

As a workaround, I tried downgrading my Node.js version, but it's not a viable solution for my project (Prisma only supports node >= 16. I propose that the langchain library consider using a different library for parsing EPUB files, one that supports newer Node.js versions. This would make langchain more compatible with projects using newer Node.js versions.

System Info

langchain@0.1.30 | MIT | deps: 17 | versions: 261
Typescript bindings for langchain
https://github.com/langchain-ai/langchainjs/tree/main/langchain/

keywords: llm, ai, gpt3, chain, prompt, prompt engineering, chatgpt, machine learning, ml, openai, embeddings, vectorstores

dist
.tarball: https://registry.npmjs.org/langchain/-/langchain-0.1.30.tgz
.shasum: e1adb3f1849fcd5c596c668300afd5dc8cb37a97
.integrity: sha512-5h/vNMmutQ98tbB0sPDlAileZVca6A2McFgGa3+D56Dm8mSSCzTQL2DngPA6h09DlKDpSr7+6PdFw5Hoj0ZDSw==
.unpackedSize: 4.1 MB

dependencies:
@anthropic-ai/sdk: ^0.9.1     js-tiktoken: ^1.0.7           ml-distance: ^4.0.0           zod-to-json-schema: ^3.22.3   
@langchain/community: ~0.0.41 js-yaml: ^4.1.0               openapi-types: ^12.1.3        zod: ^3.22.4                  
@langchain/core: ~0.1.44      jsonpointer: ^5.0.1           p-retry: 4                    
@langchain/openai: ~0.0.19    langchainhub: ~0.0.8          uuid: ^9.0.0                  
binary-extensions: ^2.2.0     langsmith: ~0.1.7             yaml: ^2.2.1                  

maintainers:
- nfcampos <nuno@boringbits.io>
- jacoblee93 <jacoblee93@gmail.com>
- davidduong <david@duong.cz>
- hwchase17 <hw.chase.17@gmail.com>
- basproul <braceasproul@gmail.com>

dist-tags:
latest: 0.1.30     next: 0.1.29-rc.0  

published 5 days ago by jacoblee93 <jacoblee93@gmail.com>
dosubot[bot] commented 7 months ago

Hey @moritzWa! 👋 I'm Dosu, back with some insights for you.

After reviewing the details you've provided, it's clear that the core issue stems from the epub2 library's dependency on zipfile, which indeed has compatibility issues with Node.js versions 12 and above. This is a known issue, as you've pointed out with the reference to the node-zipfile GitHub issue.

Given the constraints of your project, especially the need to use a Node.js version that's compatible with Prisma (which requires Node.js >= 16), downgrading Node.js isn't a viable workaround. Therefore, the most straightforward solution would be to find an alternative to the epub2 library that supports newer Node.js versions.

Here are a couple of steps you can take to move forward:

  1. Search for Alternatives: Look for alternative libraries that can parse EPUB files and are compatible with newer Node.js versions. Libraries like epubjs or node-epub might offer the functionality you need without the compatibility issues. Make sure to check their documentation and GitHub issues for any known compatibility concerns.

  2. Reach Out to LangChain Maintainers: Since epub2 is a dependency of LangChain, it would be beneficial to reach out to the LangChain maintainers. You can suggest they consider updating their dependencies to use a more modern library for EPUB parsing that supports newer Node.js versions. This would not only solve your issue but also benefit the broader LangChain user community.

  3. Consider Contributing: If you have the capacity and interest, consider contributing to the LangChain project by suggesting or even implementing the dependency update. Open source projects greatly benefit from contributions, and this could be a great way to improve LangChain for everyone.

  4. Temporary Workarounds: While searching for a long-term solution, you might explore temporary workarounds, such as using a microservice architecture where the EPUB parsing functionality runs in a separate service using a compatible Node.js version. This is more of a band-aid solution but might help you progress with your project in the short term.

I understand this issue is a significant roadblock for your project, and I hope these suggestions help you find a path forward. If you have any more questions or need further assistance, feel free to ask. Good luck! 🚀

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
moritzWa commented 7 months ago

My bad, the problem was that the EPubLoader function takes only local file paths, not a URL. Might be worth clarifying this in the docs (Not sure how obvious this is for other ppl). Docs for reference:

import { EPubLoader } from "langchain/document_loaders/fs/epub";

const loader = new EPubLoader("src/document_loaders/example_data/example.epub");

const docs = await loader.load(); 

The problem with the server logging Module not found: Can't resolve 'zipfile' in '/Users/m/Documents/ai-quote-finder/node_modules/epub2' remains.

RoyBkker commented 5 months ago

So you say you can't use this in a serverless environment, like Nextjs? Usecase = someone uploads an epub on the frontend, and need to handle it with EPubLoader is not possible...