Curious to integrate together.ai API to optimize the latency.

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

https://llmlingua.com/

MIT License

4.27k stars 228 forks source link

Curious to integrate together.ai API to optimize the latency. #70

Open Pr0fe5s0r opened 6 months ago

Pr0fe5s0r commented 6 months ago

Hey guys, Just got an idea. it will be great if we add the together.ai API inside this project because they use almost every possible open-source model through API calls. What do you guys think of it?

iofu728 commented 6 months ago

Hi @Pr0fe5s0r,

Thank you for your support of LLMLingua. I briefly reviewed Together.ai and noticed that they provide logprobs (https://docs.together.ai/docs/inference-parameters#logprops-api-only). However, I'm not certain if they can return logprobs for the prompt section. If they do, we could consider providing support for it.

Thank you for the information.

Ice-Hazymoon commented 4 months ago

Hi @iofu728 ,

I'm looking forward to this, any progress?

It's a great project, but my server is too slow to run LLM, I'd like to use together.ai or ChatGPT API