starchat-alpha change (cache=True on model) causes guidance to crash at end of completion

bluecoconut commented 1 year ago

The bug After an update to the Starchat-alpha model (https://huggingface.co/HuggingFaceH4/starchat-alpha/commit/2f20a76066a3fd9d7b0c28d5f11999042aebb2f4)

It seems like something about the model's _past_key_values is in an unexpected format for what the llm and transformer class expect.

To Reproduce A simple Transformer model based on it

import torch
from transformers import AutoTokenizer,AutoModelForCausalLM
import guidance

class StarcoderChat(guidance.llms.Transformers):
    def __init__(self, model_path, **kwargs):
        tokenizer = AutoTokenizer.from_pretrained(model_path, device_map='auto')  # revision='5058bd8557100137ade3c459bfc8100e90f71ec7'
        model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto', torch_dtype=torch.bfloat16) # revision='5058bd8557100137ade3c459bfc8100e90f71ec7'
        super().__init__(model, tokenizer=tokenizer, device_map='auto', **kwargs)

    @staticmethod
    def role_start(role):
        return "<|"+role+"|>"

    @staticmethod
    def role_end(role):
        return '<|end|>' 
model_path = "HuggingFaceH4/starchat-alpha"
guidance.llm = StarcoderChat(model_path)

prompt = guidance('''
{{#system~}}
You are a helpful and terse assistant.
{{~/system}}
{{#user~}}
How do you print something cool in python
{{~/user}}
{{#assistant~}}
{{gen 'answer' stop='<|end|>'}}
{{~/assistant}}''')
prompt()

gets an error:

File "/home/vscode/.local/lib/python3.10/site-packages/guidance/library/_gen.py", line 151, in gen
    for resp in gen_obj:
  File "/home/vscode/.local/lib/python3.10/site-packages/guidance/llms/_transformers.py", line 361, in _stream_then_save
    self._update_prefix_cache(streamer)
  File "[/home/vscode/.local/lib/python3.10/site-packages/guidance/llms/_transformers.py](https://vscode-remote+ssh-002dremote-002bazure-002da100.vscode-resource.vscode-cdn.net/home/vscode/.local/lib/python3.10/site-packages/guidance/llms/_transformers.py)", line 352, in _update_prefix_cache
    self._prefix_cache = streamer.generated_sequence[0][:self._past_key_values[0][0].shape[2]] # self._past_key_values is already saved, this just aligns with it
IndexError: tuple index out of range

Error in program:  tuple index out of range

System info (please complete the following information):

OS: Ubuntu
Guidance Version (guidance.__version__):guidance==0.0.54 and guidance==0.0.47 (I checked 2 versions, both showed the same error)

slundberg commented 1 year ago

Thanks! I had not seen that format for last_key_values before...and I can't say I fully understand it since StarCoder only has 256 floats per key/value pair. How you cache 48 heads worth of KVs in 256 floats is something I would need to dig into to understand. But regardless, in the HF implementation for that model they reference -2 from the end of the array instead of +2 from the start to get the sequence length, so now we do the same and everything runs.

sameerp30 commented 11 months ago

How was this error of _past_key_values solved for the starcoder model? Its tensor format is different from GPT-2 hence giving an error in line no. 281 of _transformer.py file

self._past_key_values = tuple((key[:,:,:prefix_match_len,:],value[:,:,:prefix_match_len,:]) for key,value in self._past_key_values)

guidance-ai / guidance

starchat-alpha change (cache=True on model) causes guidance to crash at end of completion #103