[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
The tokenizer simply treats these wrong special tokens (bos/eos/pad) as unknown tokens, I don't know what effect that has exactly. The difference in compression is not very significant, but there is some difference.
Steps to reproduce
Add print(tokenized_text) in line 57 in utils.py to see the wrong tokens used for the xlm-robert-large based compression model.
Expected Behavior
The correct special tokens should be used for the respective compression model.
Describe the bug
The
TokenClfDataset
is initialized without amodel_name
parameter and therefore defaults tobert-base-multilingual-cased
, meaning that incorrect special tokens are used in llmlingua-2, i.e.instead of
The tokenizer simply treats these wrong special tokens (bos/eos/pad) as unknown tokens, I don't know what effect that has exactly. The difference in compression is not very significant, but there is some difference.
Steps to reproduce
Add
print(tokenized_text)
in line 57 inutils.py
to see the wrong tokens used for the xlm-robert-large based compression model.Expected Behavior
The correct special tokens should be used for the respective compression model.
Additional Information