Open hvgazula opened 7 months ago
Ideally, if the tfrecords are created using the API, with the aforementioned change, we can ensure the same number of records in every shard except the last one. Now, if n_volumes is not specified, it can be calculated using this function, which is num_records_first_shard * (num_shards - 1) + num_records_in_last_shard
https://github.com/neuronets/nobrainer/blob/976691d685824fd4bba836498abea4184cffd798/nobrainer/dataset.py#L115-L122
If the number of volumes in the shard is too large, this snippet of code can be time-consuming. Alternatives are
n_volumes
and number of files withfile_pattern
to calculatelen(first_shard)