To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
[Question]: Token indices sequence length is longer than the specified maximum sequence length for this model (614 > 512). Running this sequence through the model will result in indexing errors #165
I use the following configuration, why is it throwing an error? I see a lot of 512 configurations in the llmlingua installation path. Do I need to retrain the model, or is it an issue with the llmlingua version?
self.model_compress = PromptCompressor(
model_name="/xxx/llmlingua/llmlingua-2-xlm-roberta-large-meetingbank",
use_llmlingua2=True, # Whether to use llmlingua-2
llmlingua2_config={
"max_batch_size": 100,
"max_force_token": 4096,
},
)
Hi @lifengyu2005, thanks for your support. These logs appear to be warnings. Did your program crash because of these warnings? Please provide more details to help us identify the issue.
Describe the issue
I use the following configuration, why is it throwing an error? I see a lot of 512 configurations in the llmlingua installation path. Do I need to retrain the model, or is it an issue with the llmlingua version?
self.model_compress = PromptCompressor( model_name="/xxx/llmlingua/llmlingua-2-xlm-roberta-large-meetingbank", use_llmlingua2=True, # Whether to use llmlingua-2 llmlingua2_config={ "max_batch_size": 100, "max_force_token": 4096, }, )
llmlingua ver 0.2.2