Closed hvgazula closed 5 months ago
resolution:
n_examples = len(feature_labels)
shards = np.array_split(feature_labels, np.arange(examples_per_shard, n_examples, examples_per_shard)
this way examples_per_shard
takes precedence
in hindsight...maybe have a logic for EITHER num_examples_per_shard
OR num_shards
?
https://github.com/neuronets/nobrainer/blob/976691d685824fd4bba836498abea4184cffd798/nobrainer/tfrecord.py#L62-L64
For example: if
examples_per_shard = 20
andlen(feature_labels) = 90
, the above snippet will result in 5 shards with 18 volumes per shard instead of 4 shards each with 20 volumes and a 5th shard with 10 volumes. I prefer the latter implementation as it aligns with what the function is expected to do.