microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications
MIT License
354 stars 31 forks source link

How to run with llama-1? #128

Closed liuxiaozhu01 closed 4 months ago

liuxiaozhu01 commented 5 months ago

As decapoda-research/llama-7b-hf is no longer accessible on HuggingFace. So i choose to download this mode for replacement. And the cmd i use is

python experiments/run_slicegpt.py \
  --model baffo32/decapoda-research-llama-7B-hf \
  --model-path /root/home/workspace/LLM/llama/decapoda-research/llama-7b-hf \

Im sure the model path is valid, but it raise error NotImplementedError: /root/home/workspace/LLM/llama/decapoda-research/llama-7b-hf is neither a Hugging Face model nor a supported local model.

What can I do? Anyone can help?

liuxiaozhu01 commented 5 months ago

And vicuna is the same.

python experiments/run_slicegpt.py \
  --model lmsys/vicuna-7b-v1.5 \
  --model-path /root/home/workspace/LLM/vicuna/lmsys/vicuna-7b-v1.5 \

The same neither a Hugging Face model nor a supported local model error is raised.

liuxiaozhu01 commented 5 months ago

I modified this line

        if not model_name.startswith("meta-llama/Llama-2") and \
            not model_name.startswith("decapoda-research/llama-7b-hf") and \
            not model_name.startswith("lmsys/vicuna-7b-v1.5"):
            return None

Then llama-1 and vicuna pretrained model can be loaded, and it runs well. but i dont know whether it has any side-effect

nailimixaM commented 5 months ago

Hi @liuxiaozhu01, thanks for trying out our code on different models! The safest way to use SliceGPT on model architectures other than the ones we've implemented (Phi-2, Llama-2, OPT) is to implement an adapter for the new architecture, see these instructions. If you wanted to do so for e.g. Vicuna we'd love to receive your PR!

The modification you made to load vicuna and decapoda's model will only work if the architectures exactly match Llama-2 (but having different weights is fine), which might be the case, I'm unfamiliar with them. I don't think Llama1 has the same architecture to Llama-2, so this would definitely need its own adapter. Hope that helps!

liuxiaozhu01 commented 4 months ago

Hi! @nailimixaM Thanks for your reply! it seems that llama1 has the same architechture to llama2 in 7b and 13b params amount, and the vicuna-7b/13b is the same. In huggingface/transformer, it works for LlamaForCausalLM loading these model above. The modification i made runs well for llama-7b and vicuna-7b, so the result might make sense?