meta-llama / llama-models

Utilities intended for use with Llama models.
Other
4.88k stars 838 forks source link

OSError with Llama3.2-3B-Instruct-QLORA_INT4_EO8 - missing files? #194

Open StephenQuirolgico opened 3 weeks ago

StephenQuirolgico commented 3 weeks ago

When trying to run Llama3.2-3B-Instruct-QLORA_INT4_EO8, I'm getting the error:

OSError: meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.

I've tried using transformers to pull the model as well as downloading the model directly using llama model download. In both cases, the models are downloading successfully, so not sure why it is missing files.

ashwinb commented 3 weeks ago

The files we provide via llama model download are intended to be run either via ExecuTorch or via llama-stack. As such, they don't have these other files you need. It sounds like your inference code is based on HuggingFace transformers, so you should download the files from the corresponding HuggingFace repositories.

I am curious what code needs these files and spitting out this error?

StephenQuirolgico commented 3 weeks ago

Yes, I'm using transformers. I've tried using both transformers pipeline and automodel:

from transformers import pipeline

model_id = "meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

and automodel:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto')

both methods produce the same error:

OSError: meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.

init27 commented 3 weeks ago

@StephenQuirolgico Can you kindly confirm what version of transformers you're using?

StephenQuirolgico commented 3 weeks ago

transformers 4.46.0

WuhanMonkey commented 3 weeks ago

Hey @StephenQuirolgico, we are working with HF to have these weights converted and supported in transformers. But for now, you can try either llama stack or export with ExecuTorch. Our official llama website has more detail on these.

We might also better help you if you share which platform you plan to run inference on and what use cases you are trying to do.

StephenQuirolgico commented 3 weeks ago

@WuhanMonkey I'm running this on RHEL 8. I have existing code using transformers and Llama3.2-3B, and just wanted to test the quantized version (by just swapping out the model in the code). Is there a rough timeframe on when these models will be supported in HF?