Exiting Chain EOF - Githubissues

ImVexed commented 7 months ago

Describe the bug When submitting a question: Exiting chain with error: Post "http://ollama:11434/api/chat": EOF

To Reproduce Steps to reproduce the behavior:

Enter a message
EOF

Expected behavior Not break.

Screenshots

Additional context I think this may be some sort of timeout issue? I note it only happens to me with Command-R. I can use Command-R (18.8gb) fine in Ollama's Web UI, with 39/41 layers offloaded to GPU (3090, 24Gb). But when I use LLocalSearch, I only see 19/41 layers offloaded. Not sure if that has anything to do with it, but is confusing me since when I use Mixtral-8x-7b(19Gb) it loads all layers to GPU and has no issues with LLocalSearch.

nilsherzig commented 7 months ago

Hi :). I assume your using the default 2k context window on open-webui? Until today, my project used a much larger context window if possible (like in the case of command-r). I just pushed an update which contains a new settings window, which allows you to adjust the context window. Please confirm that this causes the increase in vRAM usage / decrease in offloaded layers.

nilsherzig commented 7 months ago

If that's the case, I assume ollama just run out of memory on your system?

ImVexed commented 7 months ago

Yes, it's certainly quicker when I lower the context window size. Though it seems to be breaking. It froze when trying to pull info from the internet here for maybe a minute or so:

ollama        | [GIN] 2024/04/14 - 00:29:19 | 200 |  3.916807898s |      172.30.0.3 | POST     "/api/chat"
searxng-1     | 2024-04-14 00:29:19,854 WARNING:searx.engines.google: ErrorContext('searx/search/processors/online.py', 116, "response = req(params['url'], **request_args)", 'searx.exceptions.SearxEngineTooManyRequestsException', None, ('Too many request',)) False
searxng-1     | 2024-04-14 00:29:19,854 ERROR:searx.engines.google: Too many requests
searxng-1     | Traceback (most recent call last):
searxng-1     |   File "/usr/local/searxng/searx/search/processors/online.py", line 163, in search
searxng-1     |     search_results = self._search_basic(query, params)
searxng-1     |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/search/processors/online.py", line 147, in _search_basic
searxng-1     |     response = self._send_http_request(params)
searxng-1     |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/search/processors/online.py", line 116, in _send_http_request
searxng-1     |     response = req(params['url'], **request_args)
searxng-1     |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/network/__init__.py", line 164, in get
searxng-1     |     return request('get', url, **kwargs)
searxng-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/network/__init__.py", line 95, in request
searxng-1     |     return future.result(timeout)
searxng-1     |            ^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
searxng-1     |     return self.__get_result()
searxng-1     |            ^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
searxng-1     |     raise self._exception
searxng-1     |   File "/usr/local/searxng/searx/network/network.py", line 289, in request
searxng-1     |     return await self.call_client(False, method, url, **kwargs)
searxng-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/network/network.py", line 272, in call_client
searxng-1     |     return Network.patch_response(response, do_raise_for_httperror)
searxng-1     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
searxng-1     |   File "/usr/local/searxng/searx/network/network.py", line 245, in patch_response
searxng-1     |     raise_for_httperror(response)
searxng-1     |   File "/usr/local/searxng/searx/network/raise_for_httperror.py", line 76, in raise_for_httperror
searxng-1     |     raise SearxEngineTooManyRequestsException()
searxng-1     | searx.exceptions.SearxEngineTooManyRequestsException: Too many request, suspended_time=3600
searxng-1     | 2024-04-14 00:29:22,329 ERROR:searx.engines.duckduckgo: engine timeout
searxng-1     | 2024-04-14 00:29:22,423 WARNING:searx.engines.duckduckgo: ErrorContext('searx/engines/duckduckgo.py', 118, 'res = get(query_url)', 'httpx.ConnectTimeout', None, (None, None, 'duckduckgo.com')) False
searxng-1     | 2024-04-14 00:29:22,423 ERROR:searx.engines.duckduckgo: HTTP requests timeout (search duration : 3.0941880460013635 s, timeout: 3.0 s) : ConnectTimeout
backend-1     | 2024/04/14 00:29:22 WARN Error downloading website error="no content found"

And after that went through it then got stuck in a loop:

Here's the full logs: temp.log

nilsherzig commented 7 months ago

I'm pretty sure that it run out of context. 2k tokens isnt much. You can see an estimate of the current context in the backend logs. I assume that the format instructions arent in the context anymore at this point. Which results in the LLM ignoring the requested structure.

nilsherzig commented 7 months ago

closing for #91

nilsherzig / LLocalSearch

Exiting Chain EOF #85