microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.49k stars 251 forks source link

[Feature Request]: Token compression using GPT-3.5-turbo #101

Open ohdearquant opened 7 months ago

ohdearquant commented 7 months ago

Is your feature request related to a problem? Please describe.

local models are too slow in compressing tokens, cannot fulfill needs of bigger dataset

Describe the solution you'd like

instead of only relying on hugging face, provide API backed models for compression

Additional context

No response

iofu728 commented 7 months ago

Hi @ohdearquant, thank you for your suggestion. We have plans to support the API-based model as a compressor.

related issue #44.

younes-io commented 6 months ago

@iofu728, is there any timeline for when this will land?

iofu728 commented 6 months ago

Hi @younes-io, there are still some blocking issues that need to be resolved. Once they are addressed, we will promptly support the corresponding feature.