neuronets / nobrainer

A framework for developing neural network models for 3D image processing.
Other
158 stars 45 forks source link

Sharding doesn't result in examples of specified size #329

Closed hvgazula closed 5 months ago

hvgazula commented 7 months ago

https://github.com/neuronets/nobrainer/blob/976691d685824fd4bba836498abea4184cffd798/nobrainer/tfrecord.py#L62-L64

For example: if examples_per_shard = 20 and len(feature_labels) = 90, the above snippet will result in 5 shards with 18 volumes per shard instead of 4 shards each with 20 volumes and a 5th shard with 10 volumes. I prefer the latter implementation as it aligns with what the function is expected to do.

hvgazula commented 7 months ago

resolution:

n_examples = len(feature_labels)
shards = np.array_split(feature_labels, np.arange(examples_per_shard, n_examples, examples_per_shard)

this way examples_per_shard takes precedence

hvgazula commented 7 months ago

in hindsight...maybe have a logic for EITHER num_examples_per_shard OR num_shards?