neuronets / nobrainer

A framework for developing neural network models for 3D image processing.
Other
152 stars 46 forks source link

Shard size is automatically determined to produce ~100MB tfrecords files #258

Open ohinds opened 10 months ago

ohinds commented 10 months ago

According to the tensorflow user guide, tfrecords files should be ~100MB (https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/performance/overview.md). When tfrecords datasets are constructed from files, the shard size could be automatically computed to follow this guidance.

satra commented 10 months ago

100MB doesn't make sense on fast disk systems like we have on openmind or for brain imaging data. i believe we have played with TB sized shards as well. i would make this a user controllable parameter.

ohinds commented 10 months ago

Well, the default currently produces tfrecord files sizes of about 20MB, so that makes even less sense. I'm suggesting an automatically-determined default, with the facility for people to override if the want something else.

Also, specifying a shard size in bytes makes way more sense than number of examples, as it currently is.

hvgazula commented 3 months ago

Probably a combination of du -hL /path/to/data and this might do?