Closed DomStan closed 6 months ago
Hi @DomStan, thanks for your interest and support in LLMLingua. This depends on the overhead you can tolerate.
Generally speaking, for chat scenarios, if the topic is relatively fixed and there's a high requirement for low latency, you might opt for Solution 2. If the topic varies significantly and you can accept online compression, then Solution 1 might be suitable, potentially offering higher performance. However, this will depend on your specific scenario.
Best wishes,
Describe the issue
Hey guys, thanks a lot for your great work on prompt compression, really amazing results!
I have a question regarding a chatbot history compression use-case, may I ask for some intuition of yours on which method might work better for it:
Thank you!