Problem loading safetensor file format

ortegaalfredo commented 1 year ago

Hi! I'm trying to load the model at https://huggingface.co/Neko-Institute-of-Science/LLaMA-13B-4bit-128g

And it only generates garbage. I'm just doing inference of the model, not training yet. This is an example output:

Loading Model ...
The safetensors archive passed at ./models/llama-13b-4bit/llama-13b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
Loaded the model in 6.03 seconds.                                                                                
Fitting 4bit scales and zeros to half
Apply AMP Wrapper ...
I think the meaning of life isbahrist InitSTMo�Ãbahbah�OF MomoãbahSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMo

Any guidance about how to load those models?

johnsmith0031 commented 1 year ago

try this:

import matmul_utils_4bit
matmul_utils_4bit.faster = False

ortegaalfredo commented 1 year ago

Not working yet.

File "/home/guest/ai/finetune/alpaca_lora_4bit/matmul_utils_4bit.py", line 131, in matmul4bit output = _matmul4bit_v2(x, qweight, scales, zeros, g_idx) File "/home/guest/ai/finetune/alpaca_lora_4bit/matmul_utils_4bit.py", line 73, in _matmul4bit_v2 quant_cuda.vecquant4matmul(x, qweight, y, scales, zeros, g_idx) RuntimeError: expected scalar type Float but found Half

johnsmith0031 commented 1 year ago

Try converting all scales to float?

johnsmith0031 / alpaca_lora_4bit

Problem loading safetensor file format #110