[design] Interface Design - Githubissues

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

https://llmlingua.com/

MIT License

4.42k stars 241 forks source link

[design] Interface Design #52

Open mydmdm opened 7 months ago

mydmdm commented 7 months ago

Purpose

Engine Interface

# Example of engine interface

get_ppl: Includes Perplexity (PPL) and Contrastive PPL calculation, supports KV-cache.
get_relevance_rank: Returns the relevance ranking between context and question.

Core Interface

coarse_level_compression_in_document: Compresses the document/demon

ation in a coarse level, allocating the budget and dynamic compression ratio.

coarse_level_compression_in_sentence: Performs coarse-level compression of sentences.
iterative_token_level_compression: Compresses the prompt at the token level.
subsequence_recover: Recovers based on the subsequence relationship.

Wrapper Interface

compress_prompt: Returns the compressed prompt.