Open icsy7867 opened 9 months ago
Also, here are a couple of short snips for what I am seeing. And this is in a brand new sessions. no previous contexts or history.
The beginning seems OK, though it could be a little faster...
And then here is a short snippet at the end. The longer it goes, the slower it gets...
Here are some other related issues I have found.
https://github.com/gradio-app/gradio/issues/6847
https://github.com/gradio-app/gradio/pull/7084
https://github.com/gradio-app/gradio/pull/7113
https://github.com/gradio-app/gradio/pull/7102
Also, I have tried this in a docker container, and on a RHEL linux host directly with the same results.
While I think the gradio clients should be updated - I have found a resolution to this:
It seems backwards, but if you add a small sleep when the deltas from the stream are being recusrively called, it stops choking the single threaded python process:
def _chat(self, message: str, history: list[list[str]], mode: str, *_: Any) -> Any:
def yield_deltas(completion_gen: CompletionGen) -> Iterable[str]:
full_response: str = ""
stream = completion_gen.response
for delta in stream:
if isinstance(delta, str):
full_response += str(delta)
elif isinstance(delta, ChatResponse):
full_response += delta.delta or ""
yield full_response
time.sleep(0.025)
Check out the difference!
Created a pull request: https://github.com/imartinez/privateGPT/pull/1589
I am running into an issue with the stream. The GPU processes seems to work just fine, but when the query_docs streams to the web console, it gets unbearably slow (No history, ~400-500 words in the context max.)
I stumbled onto this, which it sounds like what I am running into. https://github.com/gradio-app/gradio/issues/7086
I believe the solution would be to use the latest version of gradio.