Open dimatx opened 1 month ago
I have a similar problem. really old bookmarks from now defunct websites or apps. But when I try to search their URL in hoarder, I get no results and so, I have no easy way to find and delete them.
I tried adding the URL "https://odsonfinance.com/chapter-2b-how-to-do-a-backdoor-roth-with-fidelity-step-by-step-instructions/" and everything works just fine. Are you on the latest version? How are you deploying hoarder?
Docker compose and on the latest version. Any other info I can help provide for troubleshooting, assuming I can reproduce?
any environment variables you have set?
Did you try downloading the full page archive? that seems to be what is causing the loop, the process never seems to finish.
Here's my docker compose and .env.
version: "3.8"
services:
web:
image: ghcr.io/hoarder-app/hoarder:${HOARDER_VERSION:-release}
restart: unless-stopped
volumes:
- data:/data
ports:
- 3200:3000
env_file:
- .env
environment:
MEILI_ADDR: http://meilisearch:7700
BROWSER_WEB_URL: http://chrome:9222
OPENAI_API_KEY: *********************************
DATA_DIR: /data
chrome:
image: gcr.io/zenika-hub/alpine-chrome:123
restart: unless-stopped
command:
- --no-sandbox
- --disable-gpu
- --disable-dev-shm-usage
- --remote-debugging-address=0.0.0.0
- --remote-debugging-port=9222
- --hide-scrollbars
meilisearch:
image: getmeili/meilisearch:v1.6
restart: unless-stopped
env_file:
- .env
environment:
MEILI_NO_ANALYTICS: "true"
volumes:
- meilisearch:/meili_data
volumes:
meilisearch: null
data: null
networks: {}
HOARDER_VERSION=release
NEXTAUTH_SECRET=*******************
MEILI_MASTER_KEY=*******************
NEXTAUTH_URL=http://*******************:3200
try increasing the CRAWLER_JOB_TIMEOUT_SEC. The default is 60 seconds, if the full page archival takes too long, it might cause this behavior.
Made it 300 seconds, issue persists. Isn't it strange that there is a loop despite no errors/failures in the logs? It also uses OpenAI credits over and over according to the logs, so could run up a bill for someone without a low budget set in OpenAI.
I have a similar issue with "https://www.npopov.com/2022/12/20/This-year-in-LLVM-2022.html", if that's any help.
I seem to have a URL that gets hoarder stuck in a loop where it tries to crawl, then recrawls, etc. It only stops when I delete the bookmark.
Please let me know if you need any more info than what I provided.