huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
207 stars 61 forks source link

running packages when running the "Supervised Fine-Tuning of Llama 3 8B on one AWS Trainium instance" sample #720

Open yahavb opened 3 weeks ago

yahavb commented 3 weeks ago

System Info

PyTorch 1.13.1 with NeuronX Training and HuggingFace transformers
Neuron 2.18.0
Python - Version Options - 3.10 (py310)
DLC 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training-neuronx:1.13.1-transformers4.36.2-neuronx-py310-sdk2.18.0-ubuntu20.04

Who can help?

@michaelbenayoun @JingyaHuang

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Precompilation step in https://huggingface.co/docs/optimum-neuron/en/training_tutorials/sft_lora_finetune_llm is failing on many missing packages. Is there a specific DLC we can use?

Expected behavior

Running the tutorial successfully. The "Fine-tune and Test Llama-3 8B on AWS Trainium" tutorial works with no issue with the same settings.

michaelbenayoun commented 3 weeks ago

Do you have the names of the packages that are missing by any chance please?

yahavb commented 3 weeks ago
docker run -it --privileged  -v /home/ec2-user:/home/ubuntu/ 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training-neuronx:1.13.1-transformers4.36.2-neuronx-py310-sdk2.18.0-ubuntu20.04 bash

apt-get update 
...
pip install --upgrade pip
....
pip3 install peft trl
...
git clone https://github.com/huggingface/optimum-neuron.git
cd optimum-neuron
pip3 install .
....
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
neuronx-cc 2.13.66.0+6dfecc895 requires protobuf<3.20, but you have protobuf 3.20.3 which is incompatible.
....
#!/bin/bash
set -ex

export NEURON_FUSE_SOFTMAX=1
export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3
export MALLOC_ARENA_MAX=64
export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cache_dir=/home/ubuntu/cache_dir_neuron/"

PROCESSES_PER_NODE=8

NUM_EPOCHS=1
TP_DEGREE=2
PP_DEGREE=1
BS=1
GRADIENT_ACCUMULATION_STEPS=8
LOGGING_STEPS=1
MODEL_NAME="meta-llama/Meta-Llama-3-8B"
OUTPUT_DIR=output-$SLURM_JOB_ID

if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
    MAX_STEPS=$((LOGGING_STEPS + 5))
else
    MAX_STEPS=-1
fi

XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE docs/source/training_tutorials/sft_lora_finetune_llm.py \
  --model_id $MODEL_NAME \
  --num_train_epochs $NUM_EPOCHS \
  --do_train \
  --learning_rate 5e-5 \
  --warmup_ratio 0.03 \
  --max_steps $MAX_STEPS \
  --per_device_train_batch_size $BS \
  --per_device_eval_batch_size $BS \
  --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \
  --gradient_checkpointing true \
  --bf16 \
  --zero_1 false \
  --tensor_parallel_size $TP_DEGREE \
  --pipeline_parallel_size $PP_DEGREE \
  --logging_steps $LOGGING_STEPS \
  --save_total_limit 1 \
  --output_dir $OUTPUT_DIR \
  --lr_scheduler_type "constant" \
  --overwrite_output_dir
....
+ export NEURON_FUSE_SOFTMAX=1
+ NEURON_FUSE_SOFTMAX=1
+ export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3
+ NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3
+ export MALLOC_ARENA_MAX=64
+ MALLOC_ARENA_MAX=64
+ export 'NEURON_CC_FLAGS=--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cache_dir=/home/ubuntu/cache_dir_neuron/'
+ NEURON_CC_FLAGS='--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cache_dir=/home/ubuntu/cache_dir_neuron/'
+ PROCESSES_PER_NODE=8
+ NUM_EPOCHS=1
+ TP_DEGREE=2
+ PP_DEGREE=1
+ BS=1
+ GRADIENT_ACCUMULATION_STEPS=8
+ LOGGING_STEPS=1
+ MODEL_NAME=meta-llama/Meta-Llama-3-8B
+ OUTPUT_DIR=output-
+ '[' '' = 1 ']'
+ MAX_STEPS=-1
+ XLA_USE_BF16=1
+ neuron_parallel_compile torchrun --nproc_per_node 8 docs/source/training_tutorials/sft_lora_finetune_llm.py --model_id meta-llama/Meta-Llama-3-8B --num_train_epochs 1 --do_train --learning_rate 5e-5 --warmup_ratio 0.03 --max_steps -1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --gradient_checkpointing true --bf16 --zero_1 false --tensor_parallel_size 2 --pipeline_parallel_size 1 --logging_steps 1 --save_total_limit 1 --output_dir output- --lr_scheduler_type constant --overwrite_output_dir
Traceback (most recent call last):
  File "/usr/local/bin/neuron_parallel_compile", line 5, in <module>
    from optimum.neuron.utils.neuron_parallel_compile import main
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/utils/neuron_parallel_compile.py", line 8, in <module>
    from torch_neuronx.parallel_compile.neuron_parallel_compile import LOGGER as torch_neuronx_logger
ModuleNotFoundError: No module named 'torch_neuronx.parallel_compile'

I tried to grab the neuron drivers:

echo 'deb https://apt.repos.neuron.amazonaws.com jammy main' > /etc/apt/sources.list.d/neuron.list
wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add - && apt-get update
apt-get install -y aws-neuronx-collectives=2.* aws-neuronx-runtime-lib=2.* aws-neuronx-tools=2.*
echo "export PATH=/opt/aws/neuron/bin:\$PATH" >> /root/.bashrc
PATH="${PATH}:/opt/aws/neuron/bin"

and python -c "import torch_neuronx" runs with no errors but no help

I then removed neuron_parallel_compile and got: ... Traceback (most recent call last): File "", line 1027, in _find_and_load File "/home/ubuntu/optimum-neuron/docs/source/training_tutorials/sft_lora_finetune_llm.py", line 11, in File "", line 1006, in _find_and_load_unlocked from optimum.neuron import NeuronHfArgumentParser as HfArgumentParser File "/usr/local/lib/python3.10/site-packages/optimum/neuron/init.py", line 18, in File "", line 688, in _load_unlocked from .trainers import Seq2SeqTrainiumTrainer, TrainiumTrainer File "/usr/local/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 20, in File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed from transformers import Seq2SeqTrainer, Trainer File "", line 1075, in _handle_fromlist File "/usr/local/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 26, in File "/usr/local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1462, in getattr from .trainer import Trainer File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 180, in import torch_xla.distributed.spmd as xs ModuleNotFoundError: No module named 'torch_xla.distributed.spmd' ...

So I tried reinstall

pip install torch-neuronx optimum[neuron] transformers

and still got the same ModuleNotFoundError: No module named 'torch_xla.distributed.spmd' error