I've noticed that whenever an input prompt gets truncated I'm getting a TypeError from the ollama response.
web-1 | 2024-11-11T20:44:43.789Z info: [inference][718] Starting an inference job for bookmark with id "fwkl0xsdqya1pk6ad9gsb4mn"
web-1 | 2024-11-11T20:44:44.471Z info: [search][812] Attempting to index bookmark with id f5zaz5x63gwtmi7akx0qc012 ...
web-1 | 2024-11-11T20:44:45.058Z info: [search][812] Completed successfully
web-1 | 2024-11-11T20:49:52.445Z info: [inference][718] Inferring tag for bookmark "fwkl0xsdqya1pk6ad9gsb4mn" used 1095 tokens and inferred: <redacted>
web-1 | 2024-11-11T20:49:52.478Z info: [inference][718] Completed successfully
web-1 | 2024-11-11T20:49:52.499Z info: [inference][712] Starting an inference job for bookmark with id "ex2ue3p24p4726elmm5h0qrt"
web-1 | 2024-11-11T20:49:53.286Z info: [search][813] Attempting to index bookmark with id fwkl0xsdqya1pk6ad9gsb4mn ...
web-1 | 2024-11-11T20:49:53.868Z info: [search][813] Completed successfully
web-1 | 2024-11-11T20:54:52.159Z error: [inference][712] inference job failed: TypeError: fetch failed
web-1 | TypeError: fetch failed
web-1 | at node:internal/deps/undici/undici:13392:13
web-1 | at async post (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:114:20)
web-1 | at async Ollama.processStreamableRequest (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:232:25)
web-1 | at async OllamaInferenceClient.runModel (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3206)
web-1 | at async OllamaInferenceClient.inferFromText (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3956)
web-1 | at async inferTagsFromText (/app/apps/workers/openaiWorker.ts:6:3135)
web-1 | at async inferTags (/app/apps/workers/openaiWorker.ts:6:3370)
web-1 | at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6792)
web-1 | at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
web-1 | 2024-11-11T20:54:52.189Z info: [inference][721] Starting an inference job for bookmark with id "hkdkt9800t7fdbr4p03waeae"
web-1 | 2024-11-11T20:59:52.273Z error: [inference][721] inference job failed: TypeError: fetch failed
web-1 | TypeError: fetch failed
web-1 | at node:internal/deps/undici/undici:13392:13
web-1 | at async post (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:114:20)
web-1 | at async Ollama.processStreamableRequest (/app/apps/workers/node_modules/.pnpm/ollama@0.5.9/node_modules/ollama/dist/shared/ollama.9c897541.cjs:232:25)
web-1 | at async OllamaInferenceClient.runModel (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3206)
web-1 | at async OllamaInferenceClient.inferFromText (/app/apps/workers/node_modules/.pnpm/@hoarder+shared@file+packages+shared_better-sqlite3@11.3.0/node_modules/@hoarder/shared/inference.ts:2:3956)
web-1 | at async inferTagsFromText (/app/apps/workers/openaiWorker.ts:6:3135)
web-1 | at async inferTags (/app/apps/workers/openaiWorker.ts:6:3370)
web-1 | at async Object.runOpenAI [as run] (/app/apps/workers/openaiWorker.ts:6:6792)
web-1 | at async Runner.runOnce (/app/apps/workers/node_modules/.pnpm/liteque@0.3.0_better-sqlite3@11.3.0/node_modules/liteque/dist/runner.js:2:2578)
web-1 | 2024-11-11T20:59:52.289Z info: [inference][721] Starting an inference job for bookmark with id "hkdkt9800t7fdbr4p03waeae"
web-1 | 2024-11-11T21:00:00.756Z info: [feed] Scheduling feed refreshing jobs ...
This is the log from ollama. It's responding successfully.
Nov 11 20:45:06 llama ollama[3317]: time=2024-11-11T20:45:06.001Z level=INFO source=server.go:601 msg="llama runner started in 2.26 seconds"
Nov 11 20:49:52 llama ollama[3317]: [GIN] 2024/11/11 - 20:49:52 | 200 | 5m8s | 10.0.0.4 | POST "/api/chat"
Nov 11 20:49:52 llama ollama[3317]: time=2024-11-11T20:49:52.521Z level=WARN source=runner.go:126 msg="truncating input prompt" limit=4096 prompt=4664 numKeep=5
Nov 11 20:54:52 llama ollama[3317]: [GIN] 2024/11/11 - 20:54:52 | 200 | 4m59s | 10.0.0.4 | POST "/api/chat"
Nov 11 20:54:52 llama ollama[3317]: time=2024-11-11T20:54:52.202Z level=WARN source=runner.go:126 msg="truncating input prompt" limit=4096 prompt=8510 numKeep=5
Nov 11 20:59:52 llama ollama[3317]: [GIN] 2024/11/11 - 20:59:52 | 200 | 5m0s | 10.0.0.4 | POST "/api/chat"
Nov 11 20:59:52 llama ollama[3317]: time=2024-11-11T20:59:52.302Z level=WARN source=runner.go:126 msg="truncating input prompt" limit=4096 prompt=8510 numKeep=5
I have the timeout set all the way up to 10 minutes. This is my environment.
HOARDER_VERSION=release
NEXTAUTH_SECRET=<redacted>
MEILI_MASTER_KEY=<redacted>
NEXTAUTH_URL=https://<redacted>
DISABLE_SIGNUPS=true
MEILI_ADDR=http://meilisearch:7700
# Ollama is in a separate LXC on the same Proxmox device
OLLAMA_BASE_URL=http://10.0.0.7:11434
INFERENCE_TEXT_MODEL=mistral
INFERENCE_IMAGE_MODEL=llava
INFERENCE_JOB_TIMEOUT_SEC=600
INFERENCE_CONTEXT_LENGTH=4096
Anything I can do to debug this further?
Steps to Reproduce
Set up a Hoarder instance
Configure environment to point to an Ollama instance
Try tagging bookmarks and observe the ones with prompts that exceed context lengths failing in Docker logs
Expected Behaviour
Although the prompt is truncated, I would except some manner of response to be parsed and not a TypeError
Screenshots or Additional Context
No response
Device Details
Hoarder is running in Docker in a Debian 12 LXC with 2 vCPUs and 8gb of RAM (alongside multiple other Docker services)
Ollama is running "baremetal" on a Debian 12 LXC with 4 vCPUs and 10gb RAM (sufficient for the 7B parameter models)
Describe the Bug
I've noticed that whenever an input prompt gets truncated I'm getting a TypeError from the ollama response.
This is the log from ollama. It's responding successfully.
I have the timeout set all the way up to 10 minutes. This is my environment.
Anything I can do to debug this further?
Steps to Reproduce
Expected Behaviour
Although the prompt is truncated, I would except some manner of response to be parsed and not a TypeError
Screenshots or Additional Context
No response
Device Details
Hoarder is running in Docker in a Debian 12 LXC with 2 vCPUs and 8gb of RAM (alongside multiple other Docker services)
Ollama is running "baremetal" on a Debian 12 LXC with 4 vCPUs and 10gb RAM (sufficient for the 7B parameter models)
Exact Hoarder Version
v0.19.0