zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://privategpt.dev
Apache License 2.0
54.25k stars 7.29k forks source link

Update gradio and gradio_client #1581

Open icsy7867 opened 9 months ago

icsy7867 commented 9 months ago

I am running into an issue with the stream. The GPU processes seems to work just fine, but when the query_docs streams to the web console, it gets unbearably slow (No history, ~400-500 words in the context max.)

I stumbled onto this, which it sounds like what I am running into. https://github.com/gradio-app/gradio/issues/7086

I believe the solution would be to use the latest version of gradio.

icsy7867 commented 9 months ago

Also, here are a couple of short snips for what I am seeing. And this is in a brand new sessions. no previous contexts or history.

The beginning seems OK, though it could be a little faster... Screen Recording - Beginning

And then here is a short snippet at the end. The longer it goes, the slower it gets...

Screen Recording - End

Here are some other related issues I have found.
https://github.com/gradio-app/gradio/issues/6847 https://github.com/gradio-app/gradio/pull/7084 https://github.com/gradio-app/gradio/pull/7113 https://github.com/gradio-app/gradio/pull/7102

Also, I have tried this in a docker container, and on a RHEL linux host directly with the same results.

icsy7867 commented 9 months ago

While I think the gradio clients should be updated - I have found a resolution to this:

It seems backwards, but if you add a small sleep when the deltas from the stream are being recusrively called, it stops choking the single threaded python process:

    def _chat(self, message: str, history: list[list[str]], mode: str, *_: Any) -> Any:
        def yield_deltas(completion_gen: CompletionGen) -> Iterable[str]:
            full_response: str = ""
            stream = completion_gen.response
            for delta in stream:
                if isinstance(delta, str):
                    full_response += str(delta)
                elif isinstance(delta, ChatResponse):
                    full_response += delta.delta or ""
                yield full_response
                time.sleep(0.025)

Check out the difference!

Screen Recording 2024-02-07 at 6 30 29 PM

icsy7867 commented 9 months ago

Created a pull request: https://github.com/imartinez/privateGPT/pull/1589