Closed fatemi closed 2 months ago
I think we should check not for +1
, but for B * T +1
. Because we want to make sure that the next batch can be fetched.
We moved the self.current_position
to the starting position of the new batch, so we need to check if that new batch can be fetched.
I've made a PR for that change: https://github.com/karpathy/build-nanogpt/pull/42
Due to line 246, all the processes can fetch their next batch, no need for another B * T
.
I think that's not the case. Imagine this scenario:
len(self.tokens)
equals 94
num_processes = 2
B=1
T=5
The first process will read tokens 0:6
, 10:16
...
After 8th step, line 246 will be self.current_position += B * T * self.num_processes
which is 80 + 1*5*2 = 90
.
In line 248, you'd check that 90 + 1 > 94
=> False
, so you'd try to fetch tokens[90:96] which is out of bounds.
Do you agree?
self.current_position
has already been moved at L246 and is ready for the next batch. The condition at L248 looks at yet another move, which is not correct [this way, it basically dismisses an additional block_size at the end of file]. It should simply look at currect_position + 1 to account only for the additional target token.