Pretrain with LLama Model - num_samples=0 Error

System Info

vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

When I tried to run run_clm.py file from Language Modelling with llama3.1 model, I am running into a error saying num_samples=0

[rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 143, in init [rank6]: raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}") [rank6]: ValueError: num_samples should be a positive integer value, but got num_samples=0

when I tried to do the same with gpt2-xl it is working

Expected behavior

Pretraining of Llama model

huggingface / optimum-habana