lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
956 stars 219 forks source link

Support for pre-determined batch sizes in DynamicBucketingSampler #1372

Closed pzelasko closed 4 months ago

pzelasko commented 4 months ago

If one can precompute the maximum possible batch sizes for every bucket, they can be provided to the sampler using this mechanism. That's an alternative to the auto-batch-size sampling mechanism using max_duration and quadratic_duration which can guarantee close-to-maximum GPU memory utilization in training without running into OOM during training.

An example of how to determine maximum batch sizes for a given model+optimizer+hardware configuration is available in https://github.com/NVIDIA/NeMo/pull/9763