huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
192 stars 59 forks source link

Add support for Llama3.1 #664

Open dacorvo opened 1 month ago

dacorvo commented 1 month ago

Feature request

Llama 3.1 is out and should be compatible with Neuron, however, it requires transformers==4.43.1, and optimum-neuron has pinned transformers to 4.41.1.

Notes that since optimum also pins transformers version to a specific range, optimum must also be modified as a prerequisite (see https://github.com/huggingface/optimum/pull/1968).

Motivation

Everybody wants the latest Llama.

Your contribution

Most fo the changes are likely to be related to training, but I will be happy to review.

juliensimon commented 1 month ago

Hi David,

FYI I'm able to load Llama-3.1-8B with optimum 0.0.23 and a manual upgrade to the latest transformers. No compilation is required, NEFFs are loaded from the cache.

from optimum.neuron import NeuronModelForCausalLM

compiler_args = {"num_cores": 8, "auto_cast_type": 'fp16'}
input_shapes = {"batch_size": 4, "sequence_length": 4096}

model = NeuronModelForCausalLM.from_pretrained(
        "meta-llama/Meta-Llama-3.1-8B",
        export=True,
        **compiler_args,
        **input_shapes)

However, generate() fails with:

Traceback (most recent call last):
  File "/home/ubuntu/llama-31-predict.py", line 8, in <module>
    outputs = model.generate(**inputs,
  File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/optimum/neuron/modeling.py", line 828, in generate
    selector = TokenSelector.create(
  File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/optimum/neuron/generation/token_selector.py", line 128, in create
    logits_processor = model._get_logits_processor(
  File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/transformers/generation/utils.py", line 871, in _get_logits_processor
    and generation_config._eos_token_tensor is not None
AttributeError: 'GenerationConfig' object has no attribute '_eos_token_tensor'

Environment:

optimum                       1.20.0
optimum-neuron                0.0.23
transformers                  4.43.3
aws-neuronx-runtime-discovery 2.9
libneuronxla                  2.0.2335
neuronx-cc                    2.13.66.0+6dfecc895
neuronx-distributed           0.7.0
torch-neuronx                 2.1.2.2.1.0
transformers-neuronx          0.10.0.21

I hope you can fix this soon :) Thanks!

grhaonan commented 2 weeks ago

when i tried 763104351884.dkr.ecr.ap-southeast-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.23-neuronx-py310-ubuntu22.04 from AWS which points to 0.0.23 optimum-neuron and the deployment failed with error

ValueError: rope_scaling must be a dictionary with with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

which is caused by a lower transformer version in this container version I believe

grhaonan commented 1 week ago

Hello David, any progress on this? appreciate it, seeing 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04 still uses transformer 4.41.1

dacorvo commented 1 week ago

@grhaonan I am working on it. You can track progress in my dev branch: https://github.com/huggingface/optimum-neuron/commits/bump_transformers/.