karpathy / llm.c

LLM training in simple, raw C/CUDA
MIT License
23.19k stars 2.57k forks source link

Possible bugs in the data loading functions #321

Open PeterZhizhin opened 4 months ago

PeterZhizhin commented 4 months ago

First, we read B*T+1 tokens, but advance the iterator by B*T tokens instead.

Then, there is this if statement:

    if (loader->current_position + (loader->num_processes * B * T + 1) * sizeof(int) > loader->file_size)

Possibly, we should remove the loader->num_processes multiplication here.

We need to verify that this is the way these functions should work.

gordicaleksa commented 3 months ago

Hey @PeterZhizhin feel free to close this issue, the +1 is not a bug because it is used only in target when you load a first batch and in the next batch it'll be part of the input and not target, so it's actually fine.