nilsherzig / LLocalSearch

LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a chain of LLMs to find the answer. The user can see the progress of the agents and the final answer. No OpenAI or Google API keys are needed.
Apache License 2.0
5.67k stars 362 forks source link

Exiting Chain EOF #85

Closed ImVexed closed 7 months ago

ImVexed commented 7 months ago

Describe the bug When submitting a question: Exiting chain with error: Post "http://ollama:11434/api/chat": EOF

To Reproduce Steps to reproduce the behavior:

  1. Enter a message
  2. EOF

Expected behavior Not break.

Screenshots image

Additional context I think this may be some sort of timeout issue? I note it only happens to me with Command-R. I can use Command-R (18.8gb) fine in Ollama's Web UI, with 39/41 layers offloaded to GPU (3090, 24Gb). But when I use LLocalSearch, I only see 19/41 layers offloaded. Not sure if that has anything to do with it, but is confusing me since when I use Mixtral-8x-7b(19Gb) it loads all layers to GPU and has no issues with LLocalSearch.

nilsherzig commented 7 months ago

Hi :). I assume your using the default 2k context window on open-webui? Until today, my project used a much larger context window if possible (like in the case of command-r). I just pushed an update which contains a new settings window, which allows you to adjust the context window. Please confirm that this causes the increase in vRAM usage / decrease in offloaded layers.

nilsherzig commented 7 months ago

If that's the case, I assume ollama just run out of memory on your system?

ImVexed commented 7 months ago

Yes, it's certainly quicker when I lower the context window size. Though it seems to be breaking. It froze when trying to pull info from the internet here for maybe a minute or so:

ollama        | [GIN] 2024/04/14 - 00:29:19 | 200 |  3.916807898s |      172.30.0.3 | POST     "/api/chat"
searxng-1     | 2024-04-14 00:29:19,854 WARNING:searx.engines.google: ErrorContext('searx/search/processors/online.py', 116, "response = req(params['url'], **request_args)", 'searx.exceptions.SearxEngineTooManyRequestsException', None, ('Too many request',)) False
searxng-1     | 2024-04-14 00:29:19,854 ERROR:searx.engines.google: Too many requests
searxng-1     | Traceback (most recent call last):
searxng-1     |   File "/usr/local/searxng/searx/search/processors/online.py", line 163, in search
searxng-1     |     search_results = self._search_basic(query, params)
searxng-1     |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/search/processors/online.py", line 147, in _search_basic
searxng-1     |     response = self._send_http_request(params)
searxng-1     |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/search/processors/online.py", line 116, in _send_http_request
searxng-1     |     response = req(params['url'], **request_args)
searxng-1     |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/network/__init__.py", line 164, in get
searxng-1     |     return request('get', url, **kwargs)
searxng-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/network/__init__.py", line 95, in request
searxng-1     |     return future.result(timeout)
searxng-1     |            ^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
searxng-1     |     return self.__get_result()
searxng-1     |            ^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
searxng-1     |     raise self._exception
searxng-1     |   File "/usr/local/searxng/searx/network/network.py", line 289, in request
searxng-1     |     return await self.call_client(False, method, url, **kwargs)
searxng-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/network/network.py", line 272, in call_client
searxng-1     |     return Network.patch_response(response, do_raise_for_httperror)
searxng-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/network/network.py", line 245, in patch_response
searxng-1     |     raise_for_httperror(response)
searxng-1     |   File "/usr/local/searxng/searx/network/raise_for_httperror.py", line 76, in raise_for_httperror
searxng-1     |     raise SearxEngineTooManyRequestsException()
searxng-1     | searx.exceptions.SearxEngineTooManyRequestsException: Too many request, suspended_time=3600
searxng-1     | 2024-04-14 00:29:22,329 ERROR:searx.engines.duckduckgo: engine timeout
searxng-1     | 2024-04-14 00:29:22,423 WARNING:searx.engines.duckduckgo: ErrorContext('searx/engines/duckduckgo.py', 118, 'res = get(query_url)', 'httpx.ConnectTimeout', None, (None, None, 'duckduckgo.com')) False
searxng-1     | 2024-04-14 00:29:22,423 ERROR:searx.engines.duckduckgo: HTTP requests timeout (search duration : 3.0941880460013635 s, timeout: 3.0 s) : ConnectTimeout
backend-1     | 2024/04/14 00:29:22 WARN Error downloading website error="no content found"

And after that went through it then got stuck in a loop: image

Here's the full logs: temp.log

nilsherzig commented 7 months ago

I'm pretty sure that it run out of context. 2k tokens isnt much. You can see an estimate of the current context in the backend logs. I assume that the format instructions arent in the context anymore at this point. Which results in the LLM ignoring the requested structure.

nilsherzig commented 7 months ago

closing for #91