huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Apache License 2.0
148 stars 187 forks source link

Pretrain with LLama Model - num_samples=0 Error #1396

Open saisuryateja1436 opened 11 hours ago

saisuryateja1436 commented 11 hours ago

System Info

vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest

Information

Tasks

Reproduction

When I tried to run run_clm.py file from Language Modelling with llama3.1 model, I am running into a error saying num_samples=0

[rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py", line 143, in init [rank6]: raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}") [rank6]: ValueError: num_samples should be a positive integer value, but got num_samples=0

when I tried to do the same with gpt2-xl it is working

image

Expected behavior

Pretraining of Llama model

regisss commented 11 hours ago

Can you please share the command you ran?