Open Mahaotian1 opened 2 months ago
Unfortunately, yes. Restoring state of the sampler is unfortunately quite tricky to do quickly, and I don’t recommend using this technique with large data. Instead, it’s easier to discard the sampler state and change the random seed to randomize the training data.
Unfortunately, yes. Restoring state of the sampler is unfortunately quite tricky to do quickly, and I don’t recommend using this technique with large data. Instead, it’s easier to discard the sampler state and change the random seed to randomize the training data.
Thank you for your reply. I have another question I would like to ask, the question is that during the training of large scale data, I use load_manifest_lazy
to read the data and take every batch on it, will it cause the cpu memory to be full?
No, CPU RAM usage should be bounded by buffer_size setting in the sampler.
No, CPU RAM usage should be bounded by buffer_size setting in the sampler.
Why does the cpu memory continue to increase during training until it is full? Is it the problem of h5file? How can I free up memory?
Are you using HDF5 files? We have a workaround fix in ASR dataset class but IIRC it only slows down the memory leak. You can try to use Lhotse Shar format instead, or LilcomChunkyWriter which are free from these issues. For large data, Lhotse Shar is recommended as it is much more io efficient.
When I retrained 30,000 hours of data from checkpoint, it took a long time to load state dict for DynamicBucketingSampler(more than 2 hours).It's it normal ?
here is my code: