huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
176 stars 53 forks source link

Optimum Neuron v 0.0.20 w/ Neuron 2.x taking too long to fine-tune TinyLlama in Amazon SageMaker #518

Open ari-vedant-jain opened 3 months ago

ari-vedant-jain commented 3 months ago

System Info

optimum-neuron 0.0.20
neuronx-cc 2.*
python 3.10

Who can help?

No response

Information

Tasks

Reproduction (minimal, reproducible, runnable)

==========requirements.txt============= regex tensorboard sentencepiece datasets==2.14.7 evaluate==0.4.1 transformers==4.36.2 git+https://github.com/huggingface/optimum-neuron.git https://pip.repos.neuron.amazonaws.com/neuronx-distributed/neuronx_distributed-0.6.0-py3-none-linux_x86_64.whl

============PREREQUISITES=================

sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF [neuron] name=Neuron YUM Repository baseurl=https://yum.repos.neuron.amazonaws.com enabled=1 metadata_expire=0 EOF sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB

sudo yum update -y

sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r) -y

sudo yum install git -y

sudo yum install aws-neuronx-dkms-2.* -y

sudo yum install aws-neuronx-collectives-2. -y sudo yum install aws-neuronx-runtime-lib-2. -y

sudo yum install aws-neuronx-tools-2.* -y

export PATH=/opt/aws/neuron/bin:$PATH

python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com

python -m pip install wget python -m pip install awscli python -m pip install boto3==1.34.53
python -m pip install botocore==1.34.53

python -m pip install neuronx-cc==2.* torch-neuronx torchvision

sudo yum install wget -y

================= wget -P 01_finetuning https://ws-assets-prod-iad-r-pdx-f3b3f9f1a7d6a3d0.s3.us-west-2.amazonaws.com/0cd5851b-5253-4a65-b351-70d0d80a7fb5/01_finetuning/requirements.txt

wget -P 01_finetuning https://ws-assets-prod-iad-r-pdx-f3b3f9f1a7d6a3d0.s3.us-west-2.amazonaws.com/0cd5851b-5253-4a65-b351-70d0d80a7fb5/01_finetuning/run_clm.py

wget -P 01_finetuning https://ws-assets-prod-iad-r-pdx-f3b3f9f1a7d6a3d0.s3.us-west-2.amazonaws.com/0cd5851b-5253-4a65-b351-70d0d80a7fb5/01_finetuning/Finetune-TinyLlama-1.1B.ipynb

wget -P 02_inference https://ws-assets-prod-iad-r-pdx-f3b3f9f1a7d6a3d0.s3.us-west-2.amazonaws.com/0cd5851b-5253-4a65-b351-70d0d80a7fb5/02_inference/Inference-TinyLlama-1.1B.ipynb

Please see python notebook attached Finetune-TinyLlama-1.1B.ipynb.json

Expected behavior

Please see full log attached log-events-viewer-result (1).csv

ari-vedant-jain commented 3 months ago

run_clm.py.txt

dacorvo commented 3 months ago

cc @michaelbenayoun.