Standard trn1n.32xlarge instance with the huggingface AMI image.
Who can help?
No response
Information
[X] The official example scripts
[ ] My own modified scripts
Tasks
[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
Use the notebook provided in the repo on fine-tuning llama2, but instead use llama 3 and add 3 custom tokens and during the pre-compile stage it seems to run in a division by 8 error. Seems like bug report #175 is related but I'm not sure how to modify the provided notebook so that it will work here.
I get that it's a tensor parallelism setting and I can modify that, but what is the standard or proper way of dealing with these kind of token embedding sizes?
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
Use the notebook provided in the repo on fine-tuning llama2, but instead use llama 3 and add 3 custom tokens and during the pre-compile stage it seems to run in a division by 8 error. Seems like bug report #175 is related but I'm not sure how to modify the provided notebook so that it will work here.
Expected behavior
Running as the notebook intended