Failed to connect to the browser instance, will retry in 5 secs

snowdream commented 2 days ago

Describe the Bug

https://docs.hoarder.app/Installation/docker

i try to run hoarder with docker compose,but failed.

Steps to Reproduce

create .env

HOARDER_VERSION=release
NEXTAUTH_SECRET=super_random_string
MEILI_MASTER_KEY=another_random_string
NEXTAUTH_URL=http://localhost:3000

create docker-compose.yml


version: "3.8"
services:
web:
image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release}
restart: unless-stopped
volumes:
  - data:/data
ports:
  - 3000:3000
env_file:
  - .env
environment:
  MEILI_ADDR: http://meilisearch:7700
  BROWSER_WEB_URL: http://chrome:9222
  # OPENAI_API_KEY: ...
  DATA_DIR: /data
chrome:
image: gcr.io/zenika-hub/alpine-chrome:123
restart: unless-stopped
command:
  - --no-sandbox
  - --disable-gpu
  - --disable-dev-shm-usage
  - --remote-debugging-address=0.0.0.0
  - --remote-debugging-port=9222
  - --hide-scrollbars
meilisearch:
image: getmeili/meilisearch:v1.11.1
restart: unless-stopped
env_file:
  - .env
environment:
  MEILI_NO_ANALYTICS: "true"
volumes:
  - meilisearch:/meili_data

volumes: meilisearch: data:


3. docker compose up -d

docker compose up -d



### Expected Behaviour

http://localhost:3000/  is OK

### Screenshots or Additional Context

<img width="1439" alt="image" src="https://github.com/user-attachments/assets/67dec34d-892c-4f0b-b070-f3be9d19b378">

### Device Details

Microsoft Edge 版本 131.0.2903.48 (正式版本) (x86_64)  On macOS

### Exact Hoarder Version

release

Azhelor commented 2 days ago

I have a similar error with latest Hoarder version. The app can be used, but when I add a bookmark, it can't retrieve any image or description.

Crush-RY commented 14 hours ago

same question

kamtschatka commented 12 hours ago

everyone using docker desktop? we have seen before, that networking works differently on e.g. windows and linux.

Crush-RY commented 12 hours ago

My deployment system is linux and this is my config file

version: "3.8"
networks:
  traefiknet:
    external: true
services:
  web:
    image: ghcr.io/hoarder-app/hoarder:release
    restart: unless-stopped
    container_name: hoarder
    volumes:
      - /opt/mydocker/hoarder/data:/data
    ports:
      - 54110:3000
    env_file:
      - .env
    networks:
      - traefiknet
    labels:
      - traefik.docker.network=traefiknet
      - traefik.enable=true
      - traefik.http.routers.hoarder.rule=Host(`hoarder.my.domain`)
      - traefik.http.routers.hoarder.entrypoints=http,https
      - traefik.http.routers.hoarder.priority=10
      - traefik.http.routers.hoarder.tls=true
      - traefik.http.services.hoarder.loadbalancer.server.port=3000
      - traefik.http.routers.hoarder.tls.certresolver=mycloudflare

  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    container_name: chrome
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
    networks:
      - traefiknet
  meilisearch:
    image: getmeili/meilisearch:v1.11.1
    restart: unless-stopped
    container_name: meilisearch
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - /opt/mydocker/hoarder/meilisearch:/meili_data
    networks:
      - traefiknet

MohamedBassem commented 5 hours ago

@Crush-RY can you share the logs from the web container?

MohamedBassem commented 5 hours ago

Hmmm, it seems like there are multiple people hitting this now. So I'll label this as a bug until we figure out what's going on.

MohamedBassem commented 5 hours ago

Was anyone running hoarder before and faced this problem after an upgrade or is this all new installations?

MohamedBassem commented 5 hours ago

I've just pushed https://github.com/hoarder-app/hoarder/commit/393d097c965c9bc223e9660b689df6a0312e9222 to log more details on the connection failure reason. It'll take 15mins for the container to be built. Once it's built, can someone switch to the nightly build and capture the error for me?

Azhelor commented 5 hours ago

Sure, here it is:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
  ▲ Next.js 14.2.13
  - Local:        http://localhost:3000
  - Network:      http://0.0.0.0:3000

 ✓ Starting...
 ✓ Ready in 411ms

> @hoarder/workers@0.1.0 start:prod /app/apps/workers
> tsx index.ts

2024-11-21T22:48:49.735Z info: Workers version: nightly
2024-11-21T22:48:49.748Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-21T22:48:49.763Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)

(process:69): VIPS-WARNING **: 22:49:40.996: threads clipped to 1024
2024-11-21T22:51:10.022Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: FetchError: request to https://raw.githubusercontent.com/cliqz-oss/adblocker/master/packages/adblocker/assets/easylist/easylist.txt failed, reason: getaddrinfo EAI_AGAIN raw.githubusercontent.com
    at ClientRequest.<anonymous> (/app/apps/workers/node_modules/.pnpm/node-fetch@2.7.0/node_modules/node-fetch/lib/index.js:1501:11)
    at ClientRequest.emit (node:events:518:28)
    at ClientRequest.emit (node:domain:489:12)
    at emitErrorEvent (node:_http_client:103:11)
    at TLSSocket.socketErrorListener (node:_http_client:506:5)
    at TLSSocket.emit (node:events:518:28)
    at TLSSocket.emit (node:domain:489:12)
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)
2024-11-21T22:51:10.023Z info: Starting crawler worker ...
2024-11-21T22:51:10.025Z info: Starting inference worker ...
2024-11-21T22:51:10.026Z info: Starting search indexing worker ...
2024-11-21T22:51:10.027Z info: Starting tidy assets worker ...
2024-11-21T22:51:10.028Z info: Starting video worker ...
2024-11-21T22:51:10.029Z info: Starting feed worker ...
2024-11-21T22:51:10.171Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:10.171Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:15.023Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-11-21T22:51:15.174Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:15.224Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:15.249Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:15.249Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:20.173Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:20.251Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:20.251Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:20.273Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:20.273Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:25.274Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:25.275Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:25.303Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:25.303Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:30.214Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:30.304Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:30.304Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:30.325Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:30.326Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:35.326Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:35.327Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:35.373Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:35.374Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:40.243Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
    at node:internal/deps/undici/undici:13392:13
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:40.374Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:40.374Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
    at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
    at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
    at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)

Hope that helps.

And to answer your first question: I run hoarder since a few days and I have this error since the beginning.

MohamedBassem commented 4 hours ago

Yeah, this is actually very helpful. I think I know how I can fix that!

MohamedBassem commented 4 hours ago

So basically what's happening here is that for one reason or the other (might be your network policies, or github being blocked, etc), hoarder is failing to download the adblock list used in the crawler. I've sent https://github.com/hoarder-app/hoarder/commit/378ad9bc157fb7741e09cdb687a97c82c2851578 to ensure that this doesn't block worker startup. And in your case, you might also want to set CRAWLER_ENABLE_ADBLOCKER=false so that you don't block the startup of the worker each time given that the download is always failing. Can you give it a try once the container is built?

Azhelor commented 3 hours ago

Thanks again for your very quick answer. I tried with the fix you pushed and I also added the line you suggested in .env file, but unfortunately, it does not work.

Here are the logs:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
  ▲ Next.js 14.2.13
  - Local:        http://localhost:3000
  - Network:      http://0.0.0.0:3000

 ✓ Starting...
 ✓ Ready in 358ms

> @hoarder/workers@0.1.0 start:prod /app/apps/workers
> tsx index.ts

(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.170Z info: Workers version: nightly
2024-11-22T00:50:20.182Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.199Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
2024-11-22T00:50:20.324Z info: Starting crawler worker ...
2024-11-22T00:50:20.325Z info: Starting inference worker ...
2024-11-22T00:50:20.325Z info: Starting search indexing worker ...
2024-11-22T00:50:20.326Z info: Starting tidy assets worker ...
2024-11-22T00:50:20.326Z info: Starting video worker ...
2024-11-22T00:50:20.326Z info: Starting feed worker ...
2024-11-22T00:50:20.365Z info: [Crawler][22] Will crawl "https://www.wikipedia.org/" for link with id "m2wi6yovvkafmnjegdic7b6c"
2024-11-22T00:50:20.365Z info: [Crawler][22] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:20.462Z info: [search][23] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:20.594Z info: [search][23] Completed successfully
2024-11-22T00:50:25.370Z error: [Crawler][22] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
s [TRPCError]: Bookmark not found
    at /app/apps/web/.next/server/chunks/6815.js:1:16914
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async t (/app/apps/web/.next/server/chunks/440.js:4:32333)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
    at async t (/app/apps/web/.next/server/chunks/440.js:4:33299)
    at async /app/apps/web/.next/server/app/api/trpc/[trpc]/route.js:1:4379
    at async Promise.all (index 1) {
  code: 'NOT_FOUND',
  [cause]: undefined
}
2024-11-22T00:50:34.670Z info: [search][24] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:34.758Z info: [search][24] Completed successfully
2024-11-22T00:50:35.751Z error: [Crawler][22] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.772Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.790Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.807Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.826Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.847Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
    at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)

(process:69): VIPS-WARNING **: 00:50:42.563: threads clipped to 1024
2024-11-22T00:50:42.890Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:42.891Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:42.909Z info: [search][26] Attempting to index bookmark with id c77a1dclbtoswxfg1dehix2z ...
2024-11-22T00:50:42.989Z info: [search][26] Completed successfully
2024-11-22T00:50:47.893Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:50:58.038Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:58.060Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:58.060Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:03.061Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:51:13.201Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
    at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
    at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
    at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
    at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
    at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
    at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
    at async Object.run (/app/apps/workers/utils.ts:2:1459)
    at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:51:13.224Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:51:13.224Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:18.225Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.

MohamedBassem commented 3 hours ago

ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point.

snowdream commented 2 hours ago

As you know,I am in China.

Does hoarder access any api i can not access?

hoarder-app / hoarder

Failed to connect to the browser instance, will retry in 5 secs #674

Describe the Bug

Steps to Reproduce