Open chimezie opened 1 week ago
The current implementation of iterate_batches produces batches for all but the remaining N items in the dataset (where N is less than the batch size). So, if your dataset size is a multiple of the batch size, you will eventually train on every item in the dataset. However, if it is not, the remainder will never be included in what is trained, regardless of how many iterations you set.
Yes, that's how it works now. Probably we should just change it to not drop the last batch even if it is not the same size as the others. I will make this as an enhancement.
The current implementation of iterate_batches produces batches for all but the remaining N items in the dataset (where N is less than the batch size). So, if your dataset size is a multiple of the batch size, you will eventually train on every item in the dataset. However, if it is not, the remainder will never be included in what is trained, regardless of how many iterations you set.
Below is very minimal test case that replicates this (it generates batches of a total size of at most 46 items without finding the remaining data before stopping):