Open jbvioix opened 2 weeks ago
This makes a lot of sense. Can you share with me the "timeout" logs that you're getting? I want to know where exactly we're timing out to make it configurage. Is it that the background job itself timesout, or is it the call to ollama that timesout
With GPU enabled, I've got these logs:
workers-1 | 2024-06-15T07:16:31.518Z info: [inference][99] Starting an inference job for bookmark with id "ouhm96clwfw25pkbmdcrlj3o"
ollama-1 | [GIN] 2024/06/15 - 07:17:16 | 200 | 44.993621295s | 172.25.0.7 | POST "/api/chat"
workers-1 | 2024-06-15T07:17:16.537Z info: [inference][99] Inferring tag for bookmark "ouhm96clwfw25pkbmdcrlj3o" used 1656 tokens and inferred: Python,History,ProgrammingLanguage,ComputerScience,DevelopmentEnvironment
workers-1 | 2024-06-15T07:17:16.584Z info: [inference][99] Completed successfully
Perfect job, no problem. If I disabled GPU, I've got this:
workers-1 | 2024-06-15T07:19:46.715Z info: [inference][100] Starting an inference job for bookmark with id "uxr3yjtfke0tu2u800jbh9rj"
ollama-1 | [GIN] 2024/06/15 - 07:24:47 | 200 | 5m1s | 172.25.0.7 | POST "/api/chat"
workers-1 | 2024-06-15T07:24:47.971Z error: [inference][100] inference job failed: TypeError: fetch failed
workers-1 | 2024-06-15T07:29:50.926Z info: [inference][100] Starting an inference job for bookmark with id "uxr3yjtfke0tu2u800jbh9rj"
...
ollama-1 | [GIN] 2024/06/15 - 07:29:49 | 200 | 5m1s | 172.25.0.7 | POST "/api/chat"
workers-1 | 2024-06-15T07:29:49.832Z error: [inference][100] inference job failed: TypeError: fetch failed
workers-1 | 2024-06-15T07:29:50.926Z info: [inference][100] Starting an inference job for bookmark with id "uxr3yjtfke0tu2u800jbh9rj"
...
ollama-1 | [GIN] 2024/06/15 - 07:34:52 | 200 | 5m1s | 172.25.0.7 | POST "/api/chat"
workers-1 | 2024-06-15T07:34:52.254Z error: [inference][100] inference job failed: TypeError: fetch failed
...
After the first fail, a new inference job is automatically launched. There are 5 minutes between each job events. I think it's a timeout somewhere...
I've successfully tried Ollama on GPU to generate keywords. However, when I use it on a CPU, I get no results. I've done a few tests in Python, the calculation time on CPU is much longer with correct results. I think there's a timeout somewhere that stops the Ollama task. Is it possible to configure it so that the CPU can be used (on a single-user lightweight server) for labelling?