Open snowdream opened 2 days ago
I have a similar error with latest Hoarder version. The app can be used, but when I add a bookmark, it can't retrieve any image or description.
same question
everyone using docker desktop? we have seen before, that networking works differently on e.g. windows and linux.
My deployment system is linux and this is my config file
version: "3.8"
networks:
traefiknet:
external: true
services:
web:
image: ghcr.io/hoarder-app/hoarder:release
restart: unless-stopped
container_name: hoarder
volumes:
- /opt/mydocker/hoarder/data:/data
ports:
- 54110:3000
env_file:
- .env
networks:
- traefiknet
labels:
- traefik.docker.network=traefiknet
- traefik.enable=true
- traefik.http.routers.hoarder.rule=Host(`hoarder.my.domain`)
- traefik.http.routers.hoarder.entrypoints=http,https
- traefik.http.routers.hoarder.priority=10
- traefik.http.routers.hoarder.tls=true
- traefik.http.services.hoarder.loadbalancer.server.port=3000
- traefik.http.routers.hoarder.tls.certresolver=mycloudflare
chrome:
image: gcr.io/zenika-hub/alpine-chrome:123
restart: unless-stopped
container_name: chrome
command:
- --no-sandbox
- --disable-gpu
- --disable-dev-shm-usage
- --remote-debugging-address=0.0.0.0
- --remote-debugging-port=9222
- --hide-scrollbars
networks:
- traefiknet
meilisearch:
image: getmeili/meilisearch:v1.11.1
restart: unless-stopped
container_name: meilisearch
env_file:
- .env
environment:
MEILI_NO_ANALYTICS: "true"
volumes:
- /opt/mydocker/hoarder/meilisearch:/meili_data
networks:
- traefiknet
@Crush-RY can you share the logs from the web container?
Hmmm, it seems like there are multiple people hitting this now. So I'll label this as a bug until we figure out what's going on.
Was anyone running hoarder before and faced this problem after an upgrade or is this all new installations?
I've just pushed https://github.com/hoarder-app/hoarder/commit/393d097c965c9bc223e9660b689df6a0312e9222 to log more details on the connection failure reason. It'll take 15mins for the container to be built. Once it's built, can someone switch to the nightly build and capture the error for me?
Sure, here it is:
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
▲ Next.js 14.2.13
- Local: http://localhost:3000
- Network: http://0.0.0.0:3000
✓ Starting...
✓ Ready in 411ms
> @hoarder/workers@0.1.0 start:prod /app/apps/workers
> tsx index.ts
2024-11-21T22:48:49.735Z info: Workers version: nightly
2024-11-21T22:48:49.748Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-21T22:48:49.763Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
(process:69): VIPS-WARNING **: 22:49:40.996: threads clipped to 1024
2024-11-21T22:51:10.022Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs: FetchError: request to https://raw.githubusercontent.com/cliqz-oss/adblocker/master/packages/adblocker/assets/easylist/easylist.txt failed, reason: getaddrinfo EAI_AGAIN raw.githubusercontent.com
at ClientRequest.<anonymous> (/app/apps/workers/node_modules/.pnpm/node-fetch@2.7.0/node_modules/node-fetch/lib/index.js:1501:11)
at ClientRequest.emit (node:events:518:28)
at ClientRequest.emit (node:domain:489:12)
at emitErrorEvent (node:_http_client:103:11)
at TLSSocket.socketErrorListener (node:_http_client:506:5)
at TLSSocket.emit (node:events:518:28)
at TLSSocket.emit (node:domain:489:12)
at emitErrorNT (node:internal/streams/destroy:170:8)
at emitErrorCloseNT (node:internal/streams/destroy:129:3)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21)
2024-11-21T22:51:10.023Z info: Starting crawler worker ...
2024-11-21T22:51:10.025Z info: Starting inference worker ...
2024-11-21T22:51:10.026Z info: Starting search indexing worker ...
2024-11-21T22:51:10.027Z info: Starting tidy assets worker ...
2024-11-21T22:51:10.028Z info: Starting video worker ...
2024-11-21T22:51:10.029Z info: Starting feed worker ...
2024-11-21T22:51:10.171Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:10.171Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:15.023Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
2024-11-21T22:51:15.174Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:15.224Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:15.249Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:15.249Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:20.173Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
at node:internal/deps/undici/undici:13392:13
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:20.251Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:20.251Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:20.273Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:20.273Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:25.274Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:25.275Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:25.303Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:25.303Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:30.214Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
at node:internal/deps/undici/undici:13392:13
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:30.304Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:30.304Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:30.325Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:30.326Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:35.326Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:35.327Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-21T22:51:35.373Z info: [Crawler][19] Will crawl "https://www.wikipedia.org/" for link with id "nhn7o9njs3vz8khu5evsl1l8"
2024-11-21T22:51:35.374Z info: [Crawler][19] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-21T22:51:40.243Z error: [search][20] search job failed: MeiliSearchCommunicationError: fetch failed
MeiliSearchCommunicationError: fetch failed
at node:internal/deps/undici/undici:13392:13
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
2024-11-21T22:51:40.374Z error: [Crawler][19] Failed to determine the content-type for the url https://www.wikipedia.org/: TimeoutError: The operation was aborted due to timeout
2024-11-21T22:51:40.374Z error: [Crawler][19] Crawling job failed: AssertionError [ERR_ASSERTION]: undefined == true
AssertionError [ERR_ASSERTION]: undefined == true
at crawlPage (/app/apps/workers/crawlerWorker.ts:3:1083)
at crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7502)
at runCrawler (/app/apps/workers/crawlerWorker.ts:3:10635)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
Hope that helps.
And to answer your first question: I run hoarder since a few days and I have this error since the beginning.
Yeah, this is actually very helpful. I think I know how I can fix that!
So basically what's happening here is that for one reason or the other (might be your network policies, or github being blocked, etc), hoarder is failing to download the adblock list used in the crawler. I've sent https://github.com/hoarder-app/hoarder/commit/378ad9bc157fb7741e09cdb687a97c82c2851578 to ensure that this doesn't block worker startup. And in your case, you might also want to set CRAWLER_ENABLE_ADBLOCKER=false
so that you don't block the startup of the worker each time given that the download is always failing. Can you give it a try once the container is built?
Thanks again for your very quick answer. I tried with the fix you pushed and I also added the line you suggested in .env file, but unfortunately, it does not work.
Here are the logs:
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service init-db-migration: starting
Running db migration script
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service init-db-migration successfully started
s6-rc: info: service svc-workers: starting
s6-rc: info: service svc-web: starting
s6-rc: info: service svc-workers successfully started
s6-rc: info: service svc-web successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
▲ Next.js 14.2.13
- Local: http://localhost:3000
- Network: http://0.0.0.0:3000
✓ Starting...
✓ Ready in 358ms
> @hoarder/workers@0.1.0 start:prod /app/apps/workers
> tsx index.ts
(node:69) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.170Z info: Workers version: nightly
2024-11-22T00:50:20.182Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:121) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-11-22T00:50:20.199Z info: [Crawler] Successfully resolved IP address, new address: http://172.21.0.3:9222/
2024-11-22T00:50:20.324Z info: Starting crawler worker ...
2024-11-22T00:50:20.325Z info: Starting inference worker ...
2024-11-22T00:50:20.325Z info: Starting search indexing worker ...
2024-11-22T00:50:20.326Z info: Starting tidy assets worker ...
2024-11-22T00:50:20.326Z info: Starting video worker ...
2024-11-22T00:50:20.326Z info: Starting feed worker ...
2024-11-22T00:50:20.365Z info: [Crawler][22] Will crawl "https://www.wikipedia.org/" for link with id "m2wi6yovvkafmnjegdic7b6c"
2024-11-22T00:50:20.365Z info: [Crawler][22] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:20.462Z info: [search][23] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:20.594Z info: [search][23] Completed successfully
2024-11-22T00:50:25.370Z error: [Crawler][22] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
s [TRPCError]: Bookmark not found
at /app/apps/web/.next/server/chunks/6815.js:1:16914
at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
at async t (/app/apps/web/.next/server/chunks/440.js:4:32333)
at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
at async a (/app/apps/web/.next/server/chunks/440.js:4:32960)
at async t (/app/apps/web/.next/server/chunks/440.js:4:33299)
at async /app/apps/web/.next/server/app/api/trpc/[trpc]/route.js:1:4379
at async Promise.all (index 1) {
code: 'NOT_FOUND',
[cause]: undefined
}
2024-11-22T00:50:34.670Z info: [search][24] Attempting to index bookmark with id m2wi6yovvkafmnjegdic7b6c ...
2024-11-22T00:50:34.758Z info: [search][24] Completed successfully
2024-11-22T00:50:35.751Z error: [Crawler][22] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.772Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.790Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.807Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.826Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:35.847Z error: [Crawler][22] Crawling job failed: Error: The bookmark either doesn't exist or is not a link
Error: The bookmark either doesn't exist or is not a link
at getBookmarkDetails (/app/apps/workers/workerUtils.ts:2:1575)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10078)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
(process:69): VIPS-WARNING **: 00:50:42.563: threads clipped to 1024
2024-11-22T00:50:42.890Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:42.891Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:50:42.909Z info: [search][26] Attempting to index bookmark with id c77a1dclbtoswxfg1dehix2z ...
2024-11-22T00:50:42.989Z info: [search][26] Completed successfully
2024-11-22T00:50:47.893Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:50:58.038Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:50:58.060Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:50:58.060Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:03.061Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
2024-11-22T00:51:13.201Z error: [Crawler][25] Crawling job failed: Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
Error: net::ERR_NAME_NOT_RESOLVED at https://www.wikipedia.org/
at navigate (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:171:27)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async Deferred.race (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/util/Deferred.js:36:20)
at async CdpFrame.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Frame.js:137:25)
at async CdpPage.goto (/app/apps/workers/node_modules/.pnpm/puppeteer-core@22.3.0/node_modules/puppeteer-core/lib/cjs/puppeteer/api/Page.js:590:20)
at async crawlPage (/app/apps/workers/crawlerWorker.ts:3:1456)
at async crawlAndParseUrl (/app/apps/workers/crawlerWorker.ts:3:7607)
at async runCrawler (/app/apps/workers/crawlerWorker.ts:3:10740)
at async Object.run (/app/apps/workers/utils.ts:2:1459)
at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
2024-11-22T00:51:13.224Z info: [Crawler][25] Will crawl "https://www.wikipedia.org/" for link with id "c77a1dclbtoswxfg1dehix2z"
2024-11-22T00:51:13.224Z info: [Crawler][25] Attempting to determine the content-type for the url https://www.wikipedia.org/
2024-11-22T00:51:18.225Z error: [Crawler][25] Failed to determine the content-type for the url https://www.wikipedia.org/: AbortError: The operation was aborted.
ok, now it's clear that you have some dns/internet problems in the container :) Basically your container can't resolve dns and this is required for the crawler to work. This is not a hoarder problem at this point.
As you know,I am in China.
Does hoarder access any api i can not access?
Describe the Bug
https://docs.hoarder.app/Installation/docker
i try to run hoarder with docker compose,but failed.
Steps to Reproduce
create .env
create docker-compose.yml
volumes: meilisearch: data:
docker compose up -d