Closed DhruvaBansal00 closed 1 year ago
Thank you for reporting can. When did you create your environment? It seems that there is an error with the new AMI. Can you use the previous one?
I created the environment yesterday, using this AMI: huggingface-neuron-2023-06-26T09-27-02.137Z-692efe1a-8d5c-4033-bcbc-5d99f2d4ae6a. I can try the previous one.
Trying huggingface-neuron-2023-04-20T11-02-28.279Z-692efe1a-8d5c-4033-bcbc-5d99f2d4ae6a
Ok that AMI works, thanks for your quick response!
I had to undo my PR to make it work on the previous AMI - https://github.com/philschmid/aws-neuron-samples/pull/2
I am trying to train a T5 model. Do you know if this AMI can be used to train a T5 model?
Thank you! We are working on fixing that ASAP!
Hey!
I am trying to follow this guide: https://huggingface.co/docs/optimum-neuron/tutorials/fine_tune_bert and fine tune BERT on a trn1.2xlarge instance. I setup the datasets as mentioned in the blog and then ran the training script but the usage of neuron cores is still at 0%. The reason why this is relevant for me is because the expected training time for me is close to 5 hours.
cc: @philschmid