microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.18k stars 222 forks source link

[Bug]: AssertionError when executing Code.ipynb #137

Closed maxcccc closed 2 months ago

maxcccc commented 2 months ago

Describe the bug

When I executed code.ipynb, AssertionError happened after waiting for 2-3 mins:

from llmlingua import PromptCompressor
import os
os.environ['CUDA_VISIBLE_DEVICES'] = "1"

from datasets import load_dataset

dataset = load_dataset("THUDM/LongBench", "repobench-p", split="test")

# select an example from MeetingBank
contexts, question, answer = [
    dataset[1][key] for key in ["context", "input", "answers"]
]
instruction = "Please complete the code given below."
question = question + "\n\nNext line of code:\n"
import json

prompt = "\n\n".join([instruction, contexts, question])

from llmlingua import PromptCompressor

llm_lingua = PromptCompressor(
)
contexts_list = contexts.split("\n")
contexts_list = [
    "\n".join(contexts_list[ii : ii + 4]) for ii in range(0, len(contexts_list), 4)
]
compressed_prompt = llm_lingua.compress_prompt(
    contexts_list,
    instruction=instruction,
    question=question,
    target_token=2000,
    condition_compare=True,
    condition_in_question="after",
    rank_method="longllmlingua",
    use_sentence_level_filter=False,
    context_budget="+100",
    dynamic_context_compression_ratio=0.4,  # enable dynamic_context_compression_ratio
    reorder_context="sort",
)
print('compressed_prompt:', compressed_prompt["compressed_prompt"])

Steps to reproduce

No response

Expected Behavior

No response

Logs

import sys; print('Python %s on %s' % (sys.version, sys.platform))
/data1/loat_proj/anaconda3/envs/langchain_clone/bin/python3 /home/itmaxc/.pycharm_helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client localhost --port 41205 --file /data1/loat_proj/codellama_rag/code_rag/llm_lingua_code_test.py 
Connected to pydev debugger (build 232.10072.31)
pydev debugger: warning: trying to add breakpoint to file that does not exist: /data1/loat_proj/codellama_rag/code_rag/c:/users/maxc/appdata/local/jetbrains/pycharm2023.2/remote_sources/-1504288812/-1096704295/langchain/vectorstores/elastic_vector_search.py (will have no effect)
pydev debugger: warning: trying to add breakpoint to file that does not exist: /data1/loat_proj/codellama_rag/code_rag/c:/users/maxc/appdata/local/jetbrains/pycharm2023.2/remote_sources/1047279509/-1103039716/transformers/generation/configuration_utils.py (will have no effect)
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:05<00:00,  2.63s/it]
/data1/loat_proj/anaconda3/envs/langchain_clone/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:410: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/data1/loat_proj/anaconda3/envs/langchain_clone/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:415: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/data1/loat_proj/anaconda3/envs/langchain_clone/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:410: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/data1/loat_proj/anaconda3/envs/langchain_clone/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:415: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
Traceback (most recent call last):
  File "/home/itmaxc/.pycharm_helpers/pydev/pydevd.py", line 1500, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/itmaxc/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/data1/loat_proj/codellama_rag/code_rag/llm_lingua_code_test.py", line 36, in <module>
    compressed_prompt = llm_lingua.compress_prompt(
  File "/data1/loat_proj/anaconda3/envs/langchain_clone/lib/python3.9/site-packages/llmlingua/prompt_compressor.py", line 676, in compress_prompt
    start = self.get_prefix_length(prefix + "\n\n", context[0])
  File "/data1/loat_proj/anaconda3/envs/langchain_clone/lib/python3.9/site-packages/llmlingua/prompt_compressor.py", line 995, in get_prefix_length
    assert self.tokenizer.decode(full_input_ids[i:]) == text[:100]
AssertionError
python-BaseException

Additional Information

No response

iofu728 commented 2 months ago

Hi @maxcccc, thanks for your feedback. We'll try to fix this issue soon.

maxcccc commented 2 months ago

Hi @iofu728 @SiyunZhao How about progress of this issue? eager to expect your any feedback.

iofu728 commented 2 months ago

Hi @maxcccc, this issue is fixed. You can update to the latest version using pip install git+https://github.com/microsoft/LLMLingua.git.