Open StephenQuirolgico opened 3 weeks ago
The files we provide via llama model download
are intended to be run either via ExecuTorch or via llama-stack. As such, they don't have these other files you need. It sounds like your inference code is based on HuggingFace transformers, so you should download the files from the corresponding HuggingFace repositories.
I am curious what code needs these files and spitting out this error?
Yes, I'm using transformers. I've tried using both transformers pipeline and automodel:
from transformers import pipeline
model_id = "meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
and automodel:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = 'meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto')
both methods produce the same error:
OSError: meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.
@StephenQuirolgico Can you kindly confirm what version of transformers you're using?
transformers 4.46.0
Hey @StephenQuirolgico, we are working with HF to have these weights converted and supported in transformers. But for now, you can try either llama stack or export with ExecuTorch. Our official llama website has more detail on these.
We might also better help you if you share which platform you plan to run inference on and what use cases you are trying to do.
@WuhanMonkey I'm running this on RHEL 8. I have existing code using transformers and Llama3.2-3B, and just wanted to test the quantized version (by just swapping out the model in the code). Is there a rough timeframe on when these models will be supported in HF?
When trying to run
Llama3.2-3B-Instruct-QLORA_INT4_EO8
, I'm getting the error:OSError: meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8 does not appear to have a file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt or flax_model.msgpack.
I've tried using
transformers
to pull the model as well as downloading the model directly usingllama model download
. In both cases, the models are downloading successfully, so not sure why it is missing files.