Closed bluecoconut closed 1 year ago
Thanks! I had not seen that format for last_key_values before...and I can't say I fully understand it since StarCoder only has 256 floats per key/value pair. How you cache 48 heads worth of KVs in 256 floats is something I would need to dig into to understand. But regardless, in the HF implementation for that model they reference -2 from the end of the array instead of +2 from the start to get the sequence length, so now we do the same and everything runs.
How was this error of _past_key_values
solved for the starcoder model? Its tensor format is different from GPT-2 hence giving an error in line no. 281 of _transformer.py file
self._past_key_values = tuple((key[:,:,:prefix_match_len,:],value[:,:,:prefix_match_len,:]) for key,value in self._past_key_values)
The bug After an update to the Starchat-alpha model (https://huggingface.co/HuggingFaceH4/starchat-alpha/commit/2f20a76066a3fd9d7b0c28d5f11999042aebb2f4)
It seems like something about the model's
_past_key_values
is in an unexpected format for what the llm and transformer class expect.To Reproduce A simple Transformer model based on it
gets an error:
System info (please complete the following information):
guidance.__version__
):guidance==0.0.54 and guidance==0.0.47 (I checked 2 versions, both showed the same error)