Why DynamicBucketingSampler is used in default setting?

lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Apache License 2.0

1.99k stars 320 forks source link

Thanks for nice repository. In VALL-E, it is important that there is same speakers voice in a batch when you train NAR model. However, you used DynamicBucketingSampler. When we use it, the data is sorted by duration. Therefore, the batch is constructed from different speakers voices.

valle/data/datamodule.py

if self.args.bucketing_sampler:
            logging.info("Using DynamicBucketingSampler")
            train_sampler = DynamicBucketingSampler(
                cuts_train,
                max_duration=self.args.max_duration,
                shuffle=self.args.shuffle,
                num_buckets=self.args.num_buckets,
                drop_last=True,
            )

lifeiteng / vall-e

Why DynamicBucketingSampler is used in default setting? #177