AttributeError: can't set attribute 'deepspeed_plugin'

anushka0415 commented 1 week ago

System Info

accelerate                    1.1.1
neuronx-cc                    2.14.227.0+2d4f85be
neuronx-distributed           0.8.0
neuronx-distributed-training  1.0.0
optimum                       1.22.0
optimum-neuron                0.0.25
torch                         2.1.2
torch-neuronx                 2.1.2.2.3.1
torch-xla                     2.1.4
torchvision                   0.16.2
triton                        2.1.0
trl                           0.12.1

Who can help?

@michaelbenayoun @JingyaHuang

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

set -ex

export NEURON_FUSE_SOFTMAX=1 export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3 export MALLOC_ARENA_MAX=64 export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cach> PROCESSES_PER_NODE=2

NUM_EPOCHS=1 TP_DEGREE=2 PP_DEGREE=1

BS=1 GRADIENT_ACCUMULATION_STEPS=8 LOGGING_STEPS=1 MODEL_NAME="meta-llama/Meta-Llama-3-8B" OUTPUT_DIR=output-$SLURM_JOB_ID

if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then MAX_STEPS=$((LOGGING_STEPS + 5)) else MAX_STEPS=-1 fi

XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE train.py \ --model_id $MODEL_NAME \ --num_train_epochs $NUM_EPOCHS \ --do_train \ --learning_rate 5e-5 \ --warmup_ratio 0.03 \ --max_steps $MAX_STEPS \ --per_device_train_batch_size $BS \ --per_device_eval_batch_size $BS \ --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ --gradient_checkpointing true \ --bf16 \ --zero_1 false \ --tensor_parallel_size $TP_DEGREE \ --pipeline_parallel_size $PP_DEGREE \ --logging_steps $LOGGING_STEPS \ --save_total_limit 1 \ --output_dir $OUTPUT_DIR \ --lr_scheduler_type "constant" \ --overwrite_output_dir

Expected behavior

compilation should pass.

anushka0415 commented 1 week ago

Traceback (most recent call last):
File "/home/ubuntu/bobble-poc/train_example/train.py", line 112, in main()
File "/home/ubuntu/bobble-poc/train_example/train.py", line 108, in main training_function(script_args, training_args)
File "/home/ubuntu/bobble-poc/train_example/train.py", line 76, in training_function
trainer = NeuronSFTTrainer(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 1753, in init super().init(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 179, in init
super().init(*args, kwargs) File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 1514, in init return Trainer.init(self, *args, *kwargs) File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func return func(args, kwargs) File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/transformers/trainer.py", line 430, in init
self.create_accelerator_and_postprocess()
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 279, in create_accelerator_and_postprocess self.accelerator = NeuronAccelerator( File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/accelerate/accelerator.py", line 153, in init super().init(**full_kwargs)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/accelerate/accelerator.py", line 415, in init self.state = AcceleratorState(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/accelerate/state.py", line 151, in init self.deepspeed_plugin = None AttributeError: can't set attribute 'deepspeed_plugin'

vedant123454 commented 1 week ago

Issue: Incorrect Variable Name in `state.py`

In the file state.py, at line 151, the code currently sets:

self.deepspeed_plugin = None

This should be corrected to:

self.deepspeed_plugins = None

make the changes in the repo and build it from source

michaelbenayoun commented 3 days ago

@vedant123454's solution might work.

As accelerate is a fast moving library, and we extend it quite a bit in optimum-neuron to make everything work, we actually bump the version for every release. Right now, the officially supported version for accelerate is 0.29.2 but 1.1.1 is installed on your system.

huggingface / optimum-neuron