Closed xffxff closed 8 months ago
thanks for reporting this! Can you try with this fix: https://github.com/huggingface/nanotron/pull/100
Sorry. I cannot reproduce the issue at this moment. It's possible that a previous configuration error on my part led to the problem. I will close this issue now
I encountered an error while executing the
examples/train_tiny_llama.sh
script from commit ff3c7746577948743da08c4868aca46cbc0c110b, without any modifications to the code or configurations, on an 8-GPU node. Below is the error log: