neuronets / nobrainer

A framework for developing neural network models for 3D image processing.
Other
159 stars 45 forks source link

Refactor code to identify correct steps per epoch #325

Open hvgazula opened 7 months ago

hvgazula commented 7 months ago

Following is an example of how the steps per epoch are incorrect whereas the number of batches is correct. This is because the get_steps_per_epoch uses n_volumes whereas getting the number of batches entails iterating through the entire dataset which can be time-consuming. Currently, n_volumes is calculated by iterating through the first shard and multiplying its size with the total number of shards. But this only works when all shards have the same number of volumes. One option is to drop_remainder at the time of writing the shards itself.

loading data
n_volumes: 9
Function: load_custom_tfrec Total runtime: 0:00:01.757646 (HH:MM:SS)
n_volumes: 5
Function: load_custom_tfrec Total runtime: 0:00:01.943470 (HH:MM:SS)
Train Batches (@ 2 GPUS): 4
Eval Batches (@ 2 GPUS): 2
Train steps per epoch: 5
Eval steps per epoch: 3
hvgazula commented 7 months ago

Again, if from_files is used this will not be an issue the very first time because n_volumes is specified ahead of time. However, this can be problematic when from_tfrecords is used. Of course, the solution is https://github.com/neuronets/nobrainer/issues/321 :)