Open dacorvo opened 1 month ago
Hi David,
FYI I'm able to load Llama-3.1-8B with optimum 0.0.23 and a manual upgrade to the latest transformers. No compilation is required, NEFFs are loaded from the cache.
from optimum.neuron import NeuronModelForCausalLM
compiler_args = {"num_cores": 8, "auto_cast_type": 'fp16'}
input_shapes = {"batch_size": 4, "sequence_length": 4096}
model = NeuronModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3.1-8B",
export=True,
**compiler_args,
**input_shapes)
However, generate()
fails with:
Traceback (most recent call last):
File "/home/ubuntu/llama-31-predict.py", line 8, in <module>
outputs = model.generate(**inputs,
File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/optimum/neuron/modeling.py", line 828, in generate
selector = TokenSelector.create(
File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/optimum/neuron/generation/token_selector.py", line 128, in create
logits_processor = model._get_logits_processor(
File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/transformers/generation/utils.py", line 871, in _get_logits_processor
and generation_config._eos_token_tensor is not None
AttributeError: 'GenerationConfig' object has no attribute '_eos_token_tensor'
Environment:
optimum 1.20.0
optimum-neuron 0.0.23
transformers 4.43.3
aws-neuronx-runtime-discovery 2.9
libneuronxla 2.0.2335
neuronx-cc 2.13.66.0+6dfecc895
neuronx-distributed 0.7.0
torch-neuronx 2.1.2.2.1.0
transformers-neuronx 0.10.0.21
I hope you can fix this soon :) Thanks!
when i tried 763104351884.dkr.ecr.ap-southeast-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.23-neuronx-py310-ubuntu22.04 from AWS which points to 0.0.23 optimum-neuron and the deployment failed with error
ValueError: rope_scaling
must be a dictionary with with two fields, type
and factor
, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
which is caused by a lower transformer version in this container version I believe
Hello David, any progress on this? appreciate it, seeing 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04 still uses transformer 4.41.1
@grhaonan I am working on it. You can track progress in my dev branch: https://github.com/huggingface/optimum-neuron/commits/bump_transformers/.
Feature request
Llama 3.1 is out and should be compatible with Neuron, however, it requires
transformers==4.43.1
, andoptimum-neuron
has pinnedtransformers
to4.41.1
.Notes that since
optimum
also pinstransformers
version to a specific range,optimum
must also be modified as a prerequisite (see https://github.com/huggingface/optimum/pull/1968).Motivation
Everybody wants the latest Llama.
Your contribution
Most fo the changes are likely to be related to training, but I will be happy to review.