Closed MarcBresson closed 5 months ago
Hi @MarcBresson, we already support this. Have a look at this response.
Thank you for your quick answer ! I didn't know pytorch lightning supported that. I will dive deeper to understand how it does the subsampling.
I stumbled upon #1348 that was mentioned in #2129 and I think my feature request was exactly the same. Should #1348 be closed too by stating that it can be done with pytorch lightning ?
samples
are obtained from the time-series at the dataset level, not the number of samples processed during one epoch at the trainer level. And it should answer your interrogation about how is the subsampling performed ;)Closing this one as the original question was answered (hooray for lightning).
feature request
Is your feature request related to a current problem? Please describe. I once read an issue on GluonTS repository about why they are using both the batch_size and the number_of_batch_per_epoch (effectively fixing the number of samples per epoch). They argu that, sometimes, with panel time series, we can have extremely large dataset. Fixing both parameters was then a mean to avoid long training phase by limiting the number of samples seen per epoch.
Describe proposed solution A new parameter
number_of_batch_per_epoch
could be added as a way to fix the number of samples per epoch. Alternatively, we could have anumber_of_sample_per_epoch
in which case the number_of_batch_per_epoch would be automatically computed.This would pose issues if the number of samples is larger than the dataset: should be throw an error, or just a warning but we still train on the entire dataset?