[Crawler] Failed to connect to the browser instance, will retry in 5 secs

geosmart commented 2 days ago

Describe the Bug

hoarder-web-1 | 2024-10-20T13:17:11.013Z info: Workers version: 0.18.0 hoarder-web-1 | 2024-10-20T13:17:11.032Z info: [Crawler] Connecting to existing browser instance: http://192.168.68.100:9222 hoarder-web-1 | 2024-10-20T13:17:11.033Z info: [Crawler] Successfully resolved IP address, new address: http://192.168.68.100:9222/ hoarder-web-1 | (node:140) [DEP0040] DeprecationWarning: The punycode module is deprecated. Please use a userland alternative instead. hoarder-web-1 | (Use node --trace-deprecation ... to show where the warning was created) hoarder-web-1 | 2024-10-20T13:17:12.714Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs hoarder-web-1 | 2024-10-20T13:17:12.715Z info: Starting crawler worker ... hoarder-web-1 | 2024-10-20T13:17:12.716Z info: Starting inference worker ... hoarder-web-1 | 2024-10-20T13:17:12.716Z info: Starting search indexing worker ... hoarder-web-1 | 2024-10-20T13:17:12.717Z info: Starting tidy assets worker ... hoarder-web-1 | 2024-10-20T13:17:17.716Z info: [Crawler] Connecting to existing browser instance: http://192.168.68.100:9222 hoarder-web-1 | 2024-10-20T13:17:17.717Z info: [Crawler] Successfully resolved IP address, new address: http://192.168.68.100:9222/ hoarder-web-1 | 2024-10-20T13:17:19.378Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs hoarder-web-1 | 2024-10-20T13:17:24.380Z info: [Crawler] Connecting to existing browser instance: http://192.168.68.100:9222 hoarder-web-1 | 2024-10-20T13:17:24.380Z info: [Crawler] Successfully resolved IP address, new address: http://192.168.68.100:9222/ hoarder-web-1 | 2024-10-20T13:17:25.947Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs hoarder-web-1 | 2024-10-20T13:17:30.948Z info: [Crawler] Connecting to existing browser instance: http://192.168.68.100:9222 hoarder-web-1 | 2024-10-20T13:17:30.948Z info: [Crawler] Successfully resolved IP address, new address: http://192.168.68.100:9222/ hoarder-web-1 | 2024-10-20T13:17:32.615Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs hoarder-web-1 | 2024-10-20T13:17:36.265Z info: [Crawler][4] Will crawl "https://docs.hoarder.app/configuration" for link with id "vrmnjh84tvaby79xbbsl6l1c" hoarder-web-1 | 2024-10-20T13:17:36.265Z info: [Crawler][4] Attempting to determine the content-type for the url https://docs.hoarder.app/configuration hoarder-web-1 | 2024-10-20T13:17:37.616Z info: [Crawler] Connecting to existing browser instance: http://192.168.68.100:9222 hoarder-web-1 | 2024-10-20T13:17:37.616Z info: [Crawler] Successfully resolved IP address, new address: http://192.168.68.100:9222/ hoarder-web-1 | 2024-10-20T13:17:39.223Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs

Steps to Reproduce

.env

HOARDER_VERSION=release
NEXTAUTH_SECRET=xxxxxx
MEILI_MASTER_KEY=IMBU2dG9d5I5s6
NEXTAUTH_URL=http://xxxxxx.duckdns.org:3000

version: "3.8"
services:
  web:
    image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release}
    restart: unless-stopped
    volumes:
      - /mnt/mind/data/hoarder/data:/data
    env_file:
      - .env
    environment:
      MEILI_ADDR: http://192.168.68.100:17700
      BROWSER_WEB_URL: http://192.168.68.100:9222
      HTTPS_PRXY: http://192.168.68.100:1081
      DATA_DIR: /data
  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    ports:
      - 9222:9222
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
  meilisearch:
    image: getmeili/meilisearch:v1.6
    restart: unless-stopped
    ports:
      - 17700:7700
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - /mnt/mind/data/hoarder/meili:/meili_data

Expected Behaviour

the host ip is 192.168.68.100 in web container,is fine

curl http://192.168.68.100:9222/json 
[  ]

why [Crawler] Failed to connect to the browser instance, will retry in 5 secs

Screenshots or Additional Context

No response

Device Details

No response

Exact Hoarder Version

v0.18.0

kamtschatka commented 2 days ago

why did you change it from

      MEILI_ADDR: http://meilisearch:7700
      BROWSER_WEB_URL: http://chrome:9222

to your current setup?

geosmart commented 1 day ago

why did you change it from
      MEILI_ADDR: http://meilisearch:7700
      BROWSER_WEB_URL: http://chrome:9222
to your current setup?

i will deploy the meilisearch seprately ,so I change it to a ip.

now http://meilisearch:17700 is working, andhttp://192.168.68.100:17700 is also working.

but chrome crawler is not working ,it get the ip but can not connect to the browser instance, i don't kown why

hoarder-web-1  | 2024-10-21T18:57:58.011Z info: [Crawler] Connecting to existing browser instance: http://192.168.68.100:9222
hoarder-web-1  | 2024-10-21T18:57:58.012Z info: [Crawler] Successfully resolved IP address, new address: http://192.168.68.100:9222/
hoarder-web-1  | 2024-10-21T18:57:59.582Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
hoarder-web-1  | 2024-10-21T18:58:04.583Z info: [Crawler] Connecting to existing browser instance: http://192.168.68.100:9222
hoarder-web-1  | 2024-10-21T18:58:04.583Z info: [Crawler] Successfully resolved IP address, new address: http://192.168.68.100:9222/
hoarder-web-1  | 2024-10-21T18:58:06.147Z error: [Crawler] Failed to connect to the browser instance, will retry in 5 secs
hoarder-web-1  | 2024-10-21T18:58:11.148Z info: [Crawler] Connecting to existing browser instance: http://192.168.68.100:9222
hoarder-web-1  | 2024-10-21T18:58:11.148Z info: [Crawler] Successfully resolved IP address, new address: http://192.168.68.100:9222/

geosmart commented 1 day ago

    logger.info(
      `[Crawler] Connecting to existing browser instance: ${serverConfig.crawler.browserWebUrl}`,
    );
    const webUrl = new URL(serverConfig.crawler.browserWebUrl);
    // We need to resolve the ip address as a workaround for https://github.com/puppeteer/puppeteer/issues/2242
    const { address: address } = await dns.promises.lookup(webUrl.hostname);
    webUrl.hostname = address;
    logger.info(
      `[Crawler] Successfully resolved IP address, new address: ${webUrl.toString()}`,
    );

    //  error here
    return puppeteer.connect({
      browserURL: webUrl.toString(),
      defaultViewport,
    });

why puppeteer.connect can't connect to http://192.168.68.100:9222

geosmart commented 5 hours ago

@MohamedBassem I found my chrome container has some error

docker logs  hoarder-chrome-1 
[1022/034016.962076:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[1022/034017.253707:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[1022/034017.253862:ERROR:bus.cc(407)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
[1022/034017.306863:WARNING:dns_config_service_linux.cc(427)] Failed to read DnsConfig.
[1022/034019.172004:INFO:policy_logger.cc(145)] :components/policy/core/common/config_dir_policy_loader.cc(118) Skipping mandatory platform policies because no policy file was found at: /etc/chromium/policies/managed
[1022/034019.172056:INFO:policy_logger.cc(145)] :components/policy/core/common/config_dir_policy_loader.cc(118) Skipping recommended platform policies because no policy file was found at: /etc/chromium/policies/recommended
[1022/034019.352144:WARNING:dns_config_service_linux.cc(427)] Failed to read DnsConfig.

DevTools listening on ws://0.0.0.0:9222/devtools/browser/dcf87fc8-ed86-4bcb-a020-cafe51606133
[1022/034019.440406:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
[1022/034020.315840:WARNING:sandbox_linux.cc(418)] InitializeSandbox() called with multiple threads in process gpu-process.
[1022/035519.442999:INFO:policy_logger.cc(145)] :components/policy/core/common/config_dir_policy_loader.cc(118) Skipping mandatory platform policies because no policy file was found at: /etc/chromium/policies/managed

is this make hoarder Failed to connect to the browser instance?

hoarder-app / hoarder