Getting errors when running phi2

I'm getting the following error:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.15s/it]
Some weights of the model checkpoint at ../compressor/compressor_llm/phi2_dolphin were not used when initializing PhiForCausalLM: ['lm_head.linear.lora_A.default.weight', 'lm_head.linear.lora_B.default.weight']
- This IS expected if you are initializing PhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing PhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
  File "/home/tempus/assistant/pythonProject/llm.py", line 54, in <module>
    original_prompt = create_prompt(system=SYSTEM_MESSAGE, template=PROMPT_TEMPLATE, context_template=CONTEXT_TEMPLATE, user_prompt=prompt_input)
  File "/home/tempus/assistant/pythonProject/llm.py", line 31, in create_prompt
    context = token_compressor.compress_context(user_prompt, context)
  File "/home/tempus/assistant/pythonProject/token_compressor.py", line 16, in compress_context
    compressed_context = compressor.compress_prompt(
  File "/home/tempus/miniconda3/envs/llm/lib/python3.10/site-packages/llmlingua/prompt_compressor.py", line 253, in compress_prompt
    context = self.iterative_compress_prompt(
  File "/home/tempus/miniconda3/envs/llm/lib/python3.10/site-packages/llmlingua/prompt_compressor.py", line 754, in iterative_compress_prompt
    past_key_values = [
TypeError: 'NoneType' object is not iterable

It works just fine with a small amount of tokens (<~350) and throws out this error when I give it some more tokens. Am I doing it wrong? It doesn't happen with other models

This is my code:

from llmlingua import PromptCompressor

instruction = "summarize the following text. Keep key information. Do not add any additional data and keep the facts accurate."

def compress_context(prompt, context):

    context_list = context.split("\n")
    context_list = ["\n".join(context_list[ii: ii + 4]) for ii in range(0, len(context_list), 4)]

    compressor = PromptCompressor(model_name="../compressor/compressor_llm/phi2_dolphin")

    compressed_context = compressor.compress_prompt(
        context=context_list,
        instruction=instruction,
        question=prompt,
        condition_compare=True,
        condition_in_question='after',
        rank_method='longllmlingua',
        use_sentence_level_filter=False,
        dynamic_context_compression_ratio=0.4,  # enable dynamic_context_compression_ratio
        ratio=0.5,
        concate_question=False
    )

    return compressed_context["compressed_prompt"]

I'm using the dolphin finetune for this

Hi @TempusFugit05,

Thank you for your interest and support. The issue is due to the model not correctly returning the KV cache during the forward pass. The specific reason is that the previous code for phi-2 wasn't based on Huggingface's implementation and didn't inherit the corresponding parent class.

One solution is to upgrade your transformers to the GitHub version and use a call like

llm_lingua = PromptCompressor("microsoft/phi-2")

Alternatively, if there is a modeling_phi.py file in the '../compressor/compressor_llm/phi2_dolphin' directory, you can delete it or replace it with the version from https://github.com/huggingface/transformers/blob/main/src/transformers/models/phi/modeling_phi.py.

microsoft / LLMLingua

Getting errors when running phi2 #78