hoarder-app / hoarder

A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search
https://hoarder.app
GNU Affero General Public License v3.0
2.25k stars 74 forks source link

Crawler aint crawling [Error when performing the request to....] #218

Closed Brancliff closed 1 week ago

Brancliff commented 2 weeks ago

Hey! New user here. I just set everything up, and the main hitch I'm at right now is that Hoarder doesn't seem to be able to get any information from links I add. No header image, no text, nothing. In the admin panel, I have a few "background jobs" lined up, but I've left it like that for a day and it hasn't progressed at all.

I also made sure to copy the links from the demo website, just in case the problem was the links themselves

The container stack here has quite a few pieces-- it's the "workers" container that I'd need to check to troubleshoot this, right? Here's what I kept getting in the container logs for it:

Internal Error: Error when performing the request to https://registry.npmjs.org/pnpm; for troubleshooting help, see https://github.com/nodejs/corepack#troubleshooting at fetch (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22882:11) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async fetchAsJson (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22896:20) at async fetchLatestStableVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22948:20) at async fetchLatestStableVersion2 (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22971:14) at async Engine.getDefaultVersion (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:23349:25) at async executePackageManagerRequest (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24207:28) at async BinaryCommand.validateAndExecute (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:21173:22) at async _Cli.run (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:22148:18) at async Object.runMain (/usr/local/lib/node_modules/corepack/dist/lib/corepack.cjs:24279:12)

... Is that related to this at all? Or am I on the wrong trail entirely here

kamtschatka commented 2 weeks ago

yes, the worker is responsible for filling everything with data. Seems like you have connectivity issues (or at least your docker container has) and pnpm can not be set up correctly, which is used to download all the dependencies for the worker.

Brancliff commented 2 weeks ago

I got my connection problems sorted out. I was able to ping google from inside the CLI for both the web and workers containers, so I know they're able to access the internet. I'm getting a new error now:

Node.js v21.7.3
 ELIFECYCLE  Command failed with exit code 1.
> @hoarder/workers@0.1.0 start:prod /app/apps/workers
> tsx index.ts
2024-06-13T08:13:39.968Z info: Workers version: 0.14.0
2024-06-13T08:13:40.002Z info: [Crawler] Connecting to existing browser instance: http://chrome:9222
(node:35) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
2024-06-13T08:13:40.042Z info: [Crawler] Successfully resolved IP address, new address: http://172.29.44.2:9222/
2024-06-13T08:13:41.212Z info: Starting crawler worker ...
2024-06-13T08:13:41.220Z info: Starting inference worker ...
2024-06-13T08:13:41.231Z info: Starting search indexing worker ...
2024-06-13T08:13:41.559Z error: [Crawler][9] Crawling job failed: SqliteError: no such table: bookmarks
/app/apps/workers/node_modules/.pnpm/better-sqlite3@9.4.3/node_modules/better-sqlite3/lib/methods/wrappers.js:5
    return this[cppdb].prepare(sql, this, false);
                       ^
SqliteError: no such table: bookmarkLinks
    at Database.prepare (/app/apps/workers/node_modules/.pnpm/better-sqlite3@9.4.3/node_modules/better-sqlite3/lib/methods/wrappers.js:5:21)
    at BetterSQLiteSession.prepareQuery (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/better-sqlite3/session.cjs:42:30)
    at BetterSQLiteSession.prepareOneTimeQuery (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/session.cjs:91:17)
    at QueryPromise._prepare (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/query-builders/update.cjs:101:81)
    at QueryPromise.run (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/query-builders/update.cjs:111:17)
    at QueryPromise.execute (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/sqlite-core/query-builders/update.cjs:123:54)
    at QueryPromise.then (/app/apps/workers/node_modules/.pnpm/drizzle-orm@0.29.4_better-sqlite3@9.4.3/node_modules/drizzle-orm/query-promise.cjs:44:17)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: 'SQLITE_ERROR'
}      

And then after that, this whole error log basically loops again

Some more info: I have 3 test bookmarks and stuck at 3 pending crawling jobs. No indexing jobs, and 3 pending inference jobs

MohamedBassem commented 2 weeks ago

this error usually indicates that the Data dir of the workers and web containers are not the same. They should be the same as they share the same database.