mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
https://firecrawl.dev
GNU Affero General Public License v3.0
14.16k stars 1.02k forks source link

[BUG] worker-1 exited with code 1 and api-1 exited with code 1 #442

Open Gedanke opened 1 month ago

Gedanke commented 1 month ago

Describe the Bug

I tried to start Firecrawl from docker.

$ docker compose up
WARN[0000] docker-compose.yaml: `version` is obsolete
[+] Running 5/5
 ✔ Network firecrawl_backend                 Created                                                               0.1s
 ✔ Container firecrawl-redis-1               Created                                                               0.0s
 ✔ Container firecrawl-playwright-service-1  Created                                                               0.0s
 ✔ Container firecrawl-api-1                 Created                                                               0.0s
 ✔ Container firecrawl-worker-1              Created                                                               0.0s
Attaching to api-1, playwright-service-1, redis-1, worker-1
redis-1               | 1:C 21 Jul 2024 02:24:56.298 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-1               | 1:C 21 Jul 2024 02:24:56.298 * Redis version=7.2.5, bits=64, commit=00000000, modified=0, pid=1, just started
redis-1               | 1:C 21 Jul 2024 02:24:56.298 * Configuration loaded
redis-1               | 1:M 21 Jul 2024 02:24:56.299 * monotonic clock: POSIX clock_gettime
redis-1               | 1:M 21 Jul 2024 02:24:56.299 * Running mode=standalone, port=6379.
redis-1               | 1:M 21 Jul 2024 02:24:56.300 * Server initialized
redis-1               | 1:M 21 Jul 2024 02:24:56.300 * Ready to accept connections tcp
playwright-service-1  | [2024-07-21 02:24:58 +0000] [9] [INFO] Running on http://[::]:3000 (CTRL + C to quit)
worker-1              | /usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22762
worker-1              |     throw new Error(
worker-1              |           ^
worker-1              |
worker-1              | Error: Error when performing the request to https://registry.npmjs.org/pnpm/latest; for troubleshooting help, see https://github.com/nodejs/corepack#troubleshooting
worker-1              |     at fetch (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22762:11)
worker-1              |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
worker-1              |     at async fetchAsJson (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22776:20)
worker-1              |     ... 4 lines matching cause stack trace ...
worker-1              |     at async Object.runMain (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24235:5) {
worker-1              |   [cause]: TypeError: fetch failed
worker-1              |       at node:internal/deps/undici/undici:12502:13
worker-1              |       at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
worker-1              |       at async fetch (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22756:16)
worker-1              |       at async fetchAsJson (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22776:20)
worker-1              |       at async fetchLatestStableVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22703:20)
worker-1              |       at async fetchLatestStableVersion2 (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22826:14)
worker-1              |       at async Engine.getDefaultVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:23436:23)
worker-1              |       at async Engine.executePackageManagerRequest (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:23528:47)
worker-1              |       at async Object.runMain (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24235:5) {
worker-1              |     [cause]: ConnectTimeoutError: Connect Timeout Error (attempted addresses: 104.16.25.34:443, 104.16.29.34:443)
worker-1              |         at onConnectTimeout (node:internal/deps/undici/undici:6635:28)
worker-1              |         at node:internal/deps/undici/undici:6587:50
worker-1              |         at Immediate._onImmediate (node:internal/deps/undici/undici:6619:13)
worker-1              |         at process.processImmediate (node:internal/timers:478:21) {
worker-1              |       code: 'UND_ERR_CONNECT_TIMEOUT'
worker-1              |     }
worker-1              |   }
worker-1              | }
worker-1              |
worker-1              | Node.js v20.15.1
worker-1 exited with code 1
api-1                 | /usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22762
api-1                 |     throw new Error(
api-1                 |           ^
api-1                 |
api-1                 | Error: Error when performing the request to https://registry.npmjs.org/pnpm/latest; for troubleshooting help, see https://github.com/nodejs/corepack#troubleshooting
api-1                 |     at fetch (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22762:11)
api-1                 |     at async fetchAsJson (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22776:20)
api-1                 |     ... 4 lines matching cause stack trace ...
api-1                 |     at async Object.runMain (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24235:5) {
api-1                 |   [cause]: TypeError: fetch failed
api-1                 |       at node:internal/deps/undici/undici:12502:13
api-1                 |       at async fetch (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22756:16)
api-1                 |       at async fetchAsJson (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22776:20)
api-1                 |       at async fetchLatestStableVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22703:20)
api-1                 |       at async fetchLatestStableVersion2 (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22826:14)
api-1                 |       at async Engine.getDefaultVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:23436:23)
api-1                 |       at async Engine.executePackageManagerRequest (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:23528:47)
api-1                 |       at async Object.runMain (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24235:5) {
api-1                 |     [cause]: HeadersTimeoutError: Headers Timeout Error
api-1                 |         at Timeout.onParserTimeout [as callback] (node:internal/deps/undici/undici:7569:32)
api-1                 |         at Timeout.onTimeout [as _onTimeout] (node:internal/deps/undici/undici:6659:17)
api-1                 |         at listOnTimeout (node:internal/timers:573:17)
api-1                 |         at process.processTimers (node:internal/timers:514:7) {
api-1                 |       code: 'UND_ERR_HEADERS_TIMEOUT'
api-1                 |     }
api-1                 |   }
api-1                 | }
api-1                 |
api-1                 | Node.js v20.15.1
api-1 exited with code 1

Worker-1 and api-1 didn't work. I can build it, but I can't run it.

$ docker ps
CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS          PORTS      NAMES
af0042698c05   firecrawl-playwright-service   "/bin/sh -c 'hyperco…"   27 minutes ago   Up 27 minutes              firecrawl-playwright-service-1
a9ea3b5d74c8   redis:alpine                   "docker-entrypoint.s…"   27 minutes ago   Up 27 minutes   6379/tcp   firecrawl-redis-1
$ curl https://registry.npmjs.org/pnpm/latest
{
    "name": "pnpm",
    "version": "9.5.0",
    "keywords": ["pnpm9", "dependency manager", "install", "installer", "uninstall", "remove", "link", "prune", "shrinkwrap", "lockfile", "fast", "rapid", "efficient", "package.json", "packages", "dependencies", "symlinks", "hardlinks", "modules", "npm", "package manager", "monorepo", "multi-package", "workspace:*"],
    "license": "MIT",
    "_id": "pnpm@9.5.0",
    "maintainers": [{
        "name": "zkochan",
        "email": "z@kochan.io"
    }, {
        "name": "pnpmuser",
        "email": "publish-bot@pnpm.io"
    }],
    "homepage": "https://pnpm.io",
    "bugs": {
        "url": "https://github.com/pnpm/pnpm/issues"
    },
    "bin": {
        "pnpm": "bin/pnpm.cjs",
        "pnpx": "bin/pnpx.cjs"
    },
    "dist": {
        "shasum": "8c155dc114e1689d18937974f6571e0ceee66f1d",
        "tarball": "https://registry.npmjs.org/pnpm/-/pnpm-9.5.0.tgz",
        "fileCount": 880,
        "integrity": "sha512-FAA2gwEkYY1iSiGHtQ0EKJ1aCH8ybJ7fwMzXM9dsT1LDoxPU/BSHlKKp2BVTAWAE5nQujPhQZwJopzh/wiDJAw==",
        "signatures": [{
            "sig": "MEQCICzcf5DogqT1evYK9hX/oKA1fg0scwuP+oCzvb1rd26kAiB2bRnpayysjOt5HLoX3oVpCX+8vFJ0eDA3qQBpHIrDCA==",
            "keyid": "SHA256:jl3bwswu80PjjokCgh0o2w5c2U4LhQAE57gj9cz1kzA"
        }],
        "unpackedSize": 17664848
    },
    "main": "bin/pnpm.cjs",
    "_from": "file:pnpm-9.5.0.tgz",
    "unpkg": "dist/pnpm.cjs",
    "engines": {
        "node": ">=18.12"
    },
    "exports": {
        ".": "./package.json"
    },
    "funding": "https://opencollective.com/pnpm",
    "scripts": {
        "lint": "eslint \"src/**/*.ts\" \"test/**/*.ts\"",
        "test": "pnpm run compile && pnpm run _test",
        "_test": "cross-env PNPM_REGISTRY_MOCK_PORT=7776 jest",
        "start": "tsc --watch",
        "bundle": "ts-node bundle.ts",
        "compile": "tsc --build && pnpm run lint --fix && rimraf dist bin/nodes && pnpm run bundle && shx cp -r node-gyp-bin dist/node-gyp-bin && shx cp -r node_modules/@pnpm/tabtab/lib/templates dist/templates && shx cp -r node_modules/ps-list/vendor dist/vendor && shx cp pnpmrc dist/pnpmrc",
        "_compile": "tsc --build",
        "pretest:e2e": "rimraf node_modules/.bin/pnpm"
    },
    "_npmUser": {
        "name": "pnpmuser",
        "email": "publish-bot@pnpm.io"
    },
    "_resolved": "/tmp/6105e9d949f50a3bb187ab3ae86654d6/pnpm-9.5.0.tgz",
    "_integrity": "sha512-FAA2gwEkYY1iSiGHtQ0EKJ1aCH8ybJ7fwMzXM9dsT1LDoxPU/BSHlKKp2BVTAWAE5nQujPhQZwJopzh/wiDJAw==",
    "repository": {
        "url": "git+https://github.com/pnpm/pnpm.git",
        "type": "git"
    },
    "_npmVersion": "10.7.0",
    "description": "Fast, disk space efficient package manager",
    "directories": {
        "test": "test"
    },
    "_nodeVersion": "18.20.3",
    "preferGlobal": true,
    "publishConfig": {
        "tag": "next-9",
        "executableFiles": ["./dist/node-gyp-bin/node-gyp", "./dist/node-gyp-bin/node-gyp.cmd", "./dist/node_modules/node-gyp/bin/node-gyp.js"]
    },
    "_hasShrinkwrap": false,
    "_npmOperationalInternal": {
        "tmp": "tmp/pnpm_9.5.0_1720370654593_0.5246475969538305",
        "host": "s3://npm-registry-packages"
    }
}

https://registry.npmjs.org/pnpm/latest is accessible.

So, what causes this phenomenon?

nickscamara commented 1 month ago

@rafaelsideguide any ideas why this is happening?

rafaelsideguide commented 1 month ago

Hey @Gedanke ! I just reviewed our docker implementation and wasn't able to reproduce this error; I was able to run both scrape and crawl with no problems. This might not be relevant, but which system are you using? Could you please try updating your Docker and rebuild the containers without cache?

cypher256 commented 1 month ago

@Gedanke Are you using a proxy? I am also using a proxy and have the same issue.

dgedanke commented 1 month ago

@Gedanke Are you using a proxy? I am also using a proxy and have the same issue.

Hey @Gedanke ! I just reviewed our docker implementation and wasn't able to reproduce this error; I was able to run both scrape and crawl with no problems. This might not be relevant, but which system are you using? Could you please try updating your Docker and rebuild the containers without cache?

Thank you very much! This case is running on my computer wsl2 system. Yesterday I used source code to build firecrawl, which can provide normal service. But these days, I can't launch it from docker. My computer has been using the agent, and I was still able to build docker normally around the beginning of July.

dgedanke commented 1 month ago

@Gedanke Are you using a proxy? I am also using a proxy and have the same issue.

Sure, my computer has been using the agent, about the beginning of July, I can still build docker normally. But recently I couldn't launch it from docker. The same problem can occur on another server that does not use a proxy.

rafaelsideguide commented 1 week ago

@dgedanke @cypher256 are you still facing this issue? Added stale tag as we didn't receive any comment on this issue for more than 40 days.

Gedanke commented 6 days ago

@dgedanke @cypher256 are you still facing this issue? Added stale tag as we didn't receive any comment on this issue for more than 40 days.

I'm sorry for taking so long to reply. I may have found the problem last month. My equipment environment does not visit https://registry.npmjs.org. To do this, I set up a mirror to NPM https://registry.npmmirror.com.

RUN corepack enable this order will make pnpm access the official https://registry.npmmirror.com, not my default https://registry.npmjs.org.

Even if I add RUN NPM config set registry https://registry.npmmirror.com at the end, or to write https://registry.npmmirror.com into the configuration file, the new image is still will not take effect.

I ended up solving the problem that way.

# RUN corepack enable
# use npm to install pnpm, skip corepack
RUN npm config set registry https://registry.npmmirror.com
RUN npm install -g pnpm && \
pnpm config set registry https://registry.npmmirror.com

And,

$ curl http://127.0.0.1:3002/test
Hello, world!

But it's not a good choice. Shutting down corepack and losing npm package management could be potentially risky.