Open Pagey opened 2 months ago
Hi @Pagey-
KGen class inherits from tf.keras.utils.Sequence. In tf.keras.utils.Sequence(PyDataset class) implement __getitem__()
method should return a complete batch, and the __len__
method should return the number of batches in the dataset. For more details can find from here.
So in KGen class, __getitem__()
method return elements from the underlying data. And here self.alist[idx] will return all element of self.alist data while idx return only index. Attached gist for the reference.
Thanks @mehtamansi29 - it looks like you changed the code in the gist between the tensorflow 2.15 and 2.16 versions? the
def __getitem__(self, idx):
return idx#self.alist[idx]
is supposed to represent an infinite data generator and thus is not limited to the length of self.alist. It could have just been written there: return np.random.random()
in any case this represents a difference in behavior between the two versions, i.e. one that is terminated after len()/__len__
batches (in tensorflow 2.15) and one that is not (in tensorflow 2.16)
i saw that in the new version method __len__
is replaced by num_batches but it doesn't seem to make a similar effect as was in 2.15 either.
how should one terminate after __len__
/num_batches batches in tensorflow 2.16 in case of an infinitely generated set?
Hi there - paraphrasing an issue from 2018 :
change is return idx#self.alist[idx] in
__getitem__
. this is relevant in cases of generated datasets- i.e. it looks as though__len__
value is ignored and it used not to be?the above code on tensorflow 2.15 (Python 3.10.13, Ubuntu 20.04) produces this output:
and on tensorflow 2.16 (Python 3.10.13, Ubuntu 20.04) produces this output: