Open jean-anton opened 4 days ago
The attention_mask should be replaced with the causal_mask key!
input_dict = {
'input_ids': inputs['input_ids'],
'causal_mask': inputs['attention_mask']
}
thank you for your response, but now I have other error, could you please give a full python code that you have tested for this model that can run with ANE?
new error:
Error: value type not convertible:
[[128000 9906 11 1268 527 499 30]]
Traceback (most recent call last):
File "/Users/jg/Devel/Projects/Pycharm/CoreML_Llama3/main_gen_text_lama405B_test1.py", line 29, in <module>
generated_text = generate_text(prompt)
File "/Users/jg/Devel/Projects/Pycharm/CoreML_Llama3/main_gen_text_lama405B_test1.py", line 20, in generate_text
predictions = model.predict(input_dict)
File "/Users/jg/Devel/Projects/Pycharm/CoreML_Llama3/.venv/lib/python3.8/site-packages/coremltools/models/model.py", line 777, in predict
return self._get_predictions(self.__proxy__,
File "/Users/jg/Devel/Projects/Pycharm/CoreML_Llama3/.venv/lib/python3.8/site-packages/coremltools/models/model.py", line 827, in _get_predictions
return proxy.predict(data, state)
RuntimeError: value type not convertible
Unfortunately, the model currently does not work on ANE, and we are still researching this issue! I will update the inference-related logic shortly, so please wait. Thank you.
ok thank you, so I will wait for your update!
Sorry for the delay! I’ve updated the code to make the model compatible with AutoTokenizer. Take a look at the “Inference.ipynb” file, and feel free to let me know if you run into any problems.
model outputs tokens, not text.
Thank you for your response, but I have tried to get the text out off the tokens without success. Could you please give a code to get that, I tried :
from transformers import AutoTokenizer
import coremltools as ct
import os
import numpy as np
model_path = "Llama-3.2-1B-Instruct.mlpackage"
tokenizer = AutoTokenizer.from_pretrained(
"meta-llama/Llama-3.2-1B-Instruct", token=os.environ["HF_TOKEN"]
)
mlmodel_fp16 = ct.models.MLModel(model_path)
inputs = tokenizer("Hello how are you?", return_tensors='np')
tok = inputs['input_ids']
st_len = tok.shape[-1]
state = mlmodel_fp16.make_state() # 루프 내에서 상태 초기화
max_length = 100 # Maximum length of the generated response
eos_token_id = tokenizer.eos_token_id # EOS token ID
temperature = 0.7 # Temperature parameter
while st_len < max_length:
mask = np.full((1, st_len := st_len + 1), -1e9)
mask = np.triu(mask, k=1)
mask = np.hstack(
[np.zeros((1, 1)), mask]
)[None, None, :, :]
input_dict = {
'input_ids': tok.astype(np.int32),
'causal_mask': mask.astype(np.int32)
}
preds = mlmodel_fp16.predict(input_dict, state=state)
logits = preds['logits']
logits = logits / temperature
probs = np.exp(logits) / np.sum(np.exp(logits))
pre_toks = np.random.choice(logits.shape[-1], p=probs[0])
tok = np.concatenate([tok, [[pre_toks]]], axis=1)
if pre_toks == eos_token_id:
break
# Decode the generated tokens
output_text = tokenizer.decode(tok[0].tolist(), skip_special_tokens=True)
print(output_text)
but I get :
hello how are you? (1) Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds, Theodds
Process finished with exit code 0
Hello,
I'm new to transformers and coreml and I have converted the model Llama-3.2-1B-Instruct from: https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct. to coreml model using
python convert.py --model_dir /Users/jg/Documents/huggingface/models/Llama-3.2-1B-Instruct/ --output_dir ./Llama-3.2-1B-Instruct.mlpackage
can you please provide me a python code that generate text from the converted coreml model using the ANE device from my macbook M2?
the code I try to use I get errors like: