langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.24k stars 2.07k forks source link

Cannot import Firecrawl in NextJS `nodejs` route #6053

Open spencermize opened 1 month ago

spencermize commented 1 month ago

Checked other resources

Example Code

import { FireCrawlLoader } from '@langchain/community/document_loaders/web/firecrawl'

Error Message and Stack Trace (if applicable)

 Error [ERR_REQUIRE_ESM]: require() of ES Module node_modules/@mendable/firecrawl-js/build/index.js from node_modules/@langchain/community/dist/document_loaders/web/firecrawl.cjs not supported.
Instead change the require of index.js in node_modules/@langchain/community/dist/document_loaders/web/firecrawl.cjs to a dynamic import() which is available in all CommonJS modules.
    at mod.require (node_modules/next/dist/server/require-hook.js:65:28)
    at Object.<anonymous> (node_modules/@langchain/community/dist/document_loaders/web/firecrawl.cjs:7:40)
    at mod.require (node_modules/next/dist/server/require-hook.js:65:28)
    at Object.<anonymous> (node_modules/@langchain/community/document_loaders/web/firecrawl.cjs:1:18)
    at mod.require (node_modules/next/dist/server/require-hook.js:65:28)
    at @langchain/community/document_loaders/web/firecrawl (web/.next/server/app/api/inngest/route.js:22:18)
    at __webpack_require__ (web/.next/server/webpack-runtime.js:33:43)
    at eval (webpack-internal:///(rsc)/./src/services/llm/tools/SiteScraper.ts:5:109)
    at (rsc)/./src/services/llm/tools/SiteScraper.ts (web/.next/server/app/api/inngest/route.js:941:1)
    ...
    ...
    at (rsc)/./src/inngest/functions/index.ts (web/.next/server/app/api/inngest/route.js:652:1)
    at __webpack_require__ (web/.next/server/webpack-runtime.js:33:43)
    at eval (webpack-internal:///(rsc)/./src/app/api/inngest/route.ts:10:76)
    at (rsc)/./src/app/api/inngest/route.ts (web/.next/server/app/api/inngest/route.js:612:1)
    at __webpack_require__ (web/.next/server/webpack-runtime.js:33:43)
    at eval (webpack-internal:///(rsc)/../node_modules/next/dist/build/webpack/loaders/next-app-loader.js?name=app%2Fapi%2Finngest%2Froute&page=%2Fapi%2Finngest%2Froute&appPaths=&pagePath=private-next-app-dir%2Fapi%2Finngest%2Froute.ts&appDir=2Fweb%2Fsrc%2Fapp&pageExtensions=tsx&pageExtensions=ts&pageExtensions=jsx&pageExtensions=js&rootDir=%2Fweb&isDev=true&tsconfigPath=tsconfig.json&basePath=&assetPrefix=&nextConfigOutput=&preferredRegion=&middlewareConfig=e30%3D!:15:121)
    at (rsc)/../node_modules/next/dist/build/webpack/loaders/next-app-loader.js?name=app%2Fapi%2Finngest%2Froute&page=%2Fapi%2Finngest%2Froute&appPaths=&pagePath=private-next-app-dir%2Fapi%2Finngest%2Froute.ts&appDir=%2Fweb%2Fsrc%2Fapp&pageExtensions=tsx&pageExtensions=ts&pageExtensions=jsx&pageExtensions=js&rootDir=%2Fweb&isDev=true&tsconfigPath=tsconfig.json&basePath=&assetPrefix=&nextConfigOutput=&preferredRegion=&middlewareConfig=e30%3D! (web/.next/server/app/api/inngest/route.js:572:1) {
  code: 'ERR_REQUIRE_ESM',
  page: '/api/inngest'
}
 PUT /api/inngest 500 in 36ms

Description

The above import fails in NextJS 14.2.5, when imported via a route whose runtime is nodejs.

Here's my tsconfig's potentially applicable section:

    "target": "esnext",
    "lib": [
      "dom",
      "dom.iterable",
      "esnext"
    ],
    "allowArbitraryExtensions": true,
    "allowImportingTsExtensions": true,
    "allowJs": true,
    "skipLibCheck": true,
    "strict": true,
    "noEmit": true,
    "esModuleInterop": true,
    "module": "esnext",
    "moduleResolution": "bundler",
    "resolveJsonModule": true,
    "isolatedModules": true,
    "downlevelIteration": true,
    "jsx": "preserve",
    "incremental": true,

Nothing special in our next.config.js

System Info

yarn info langchain
└─ langchain@npm:0.2.8
   ├─ Instances: 1
   ├─ Version: 0.2.8
   │ 
   └─ Dependencies
      ├─ @langchain/core@npm:>=0.2.9 <0.3.0 → npm:0.2.12
      ├─ @langchain/openai@npm:>=0.1.0 <0.3.0 → npm:0.2.1
      ├─ @langchain/textsplitters@npm:~0.0.0 → npm:0.0.2
      ├─ binary-extensions@npm:^2.2.0 → npm:2.2.0
      ├─ js-tiktoken@npm:^1.0.12 → npm:1.0.12
      ├─ js-yaml@npm:^4.1.0 → npm:4.1.0
      ├─ jsonpointer@npm:^5.0.1 → npm:5.0.1
      ├─ langchainhub@npm:~0.0.8 → npm:0.0.8
      ├─ langsmith@npm:~0.1.30 → npm:0.1.34
      ├─ ml-distance@npm:^4.0.0 → npm:4.0.1
      ├─ openapi-types@npm:^12.1.3 → npm:12.1.3
      ├─ p-retry@npm:4 → npm:4.6.2
      ├─ uuid@npm:^9.0.0 → npm:9.0.1
      ├─ yaml@npm:^2.2.1 → npm:2.4.1
      ├─ zod-to-json-schema@npm:^3.22.3 → npm:3.23.0
      └─ zod@npm:^3.22.4 → npm:3.23.8

Mac Node 20.11.1 Yarn 4.0.2

dosubot[bot] commented 1 month ago

To resolve the ERR_REQUIRE_ESM error when importing FireCrawlLoader from @langchain/community/document_loaders/web/firecrawl in a NextJS nodejs route, you need to ensure that your Node.js environment supports ES modules. This can be done by setting the type field in your package.json to "module" and using the .mjs file extension for your module files.

Here is how you can modify your setup:

  1. Update package.json:

    {
     "type": "module"
    }
  2. Rename your file to use the .mjs extension: Rename your route file from yourfile.js to yourfile.mjs.

  3. Use import syntax in your .mjs file:

    import { FireCrawlLoader } from "@langchain/community/document_loaders/web/firecrawl";
    
    const loader = new FireCrawlLoader({
     url: "https://firecrawl.dev", // The URL to scrape
     apiKey: process.env.FIRECRAWL_API_KEY, // Optional, defaults to `FIRECRAWL_API_KEY` in your env.
     mode: "scrape", // The mode to run the crawler in. Can be "scrape" for single urls or "crawl" for all accessible subpages
     params: {
       // optional parameters based on Firecrawl API docs
       // For API documentation, visit https://docs.firecrawl.dev
     },
    });
    
    const docs = await loader.load();

By following these steps, you should be able to resolve the ERR_REQUIRE_ESM error and successfully import and use FireCrawlLoader in your NextJS nodejs route [1].

To continue talking to Dosu, mention @dosu.

bracesproul commented 1 month ago

cc @ericciarla have you seen anything like this before?

spencermize commented 1 month ago

Update: if I use "type": "module" in my package.json, the problem disappears. Is that a requirement of Langchain?

jacoblee93 commented 1 month ago

@spencermize no but some third-party packages may only support one or the other natively - we may be able to shim it in. Will ping their team as well.

ericciarla commented 1 month ago

We just added an issue for this!

spencermize commented 1 month ago

You rock - thanks! In the immediate term, we're good with running "type": "module", so we can close this ticket unless you want to keep it open.

toboid commented 1 month ago

Another possible solution would be to swap the import of @mendable/firecrawl-js for a dynamic one (similar to the pattern here) as this should be importable from common js (ref).

Sounds like this is in hand now but if not i'd be happy to contribute that change.

nickscamara commented 1 month ago

@jacoblee93 @bracesproul We just pushed commonjs support for it. Version 0.0.31. 🚀

Edit: version 0.0.35 fixes not 31

jacoblee93 commented 1 month ago

Ah awesome! Will push a new version with the updated dep. We should also relax it to use ~ I think

nickscamara commented 1 month ago

@jacoblee93 Apologies, version 0.0.35* fixes it (not 31)! Thank you!