ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
MIT License
1.85k stars 539 forks source link

Why do we need `batch_size=batch_size * group_size`? What is the point of the `group_size` variable? #132

Open chenming6615 opened 2 years ago

chenming6615 commented 2 years ago

How can group_size larger than 1 will enable sorting in Dataset ?

or Why do we need enable sorting in Dataset?

https://github.com/ming024/FastSpeech2/blob/d4e79eb52e8b01d24703b2dfc0385544092958f3/train.py#L31

batch_size = train_config["optimizer"]["batch_size"]
    group_size = 4  # Set this larger than 1 to enable sorting in Dataset
    assert batch_size * group_size < len(dataset)
    loader = DataLoader(
        dataset,
        batch_size=batch_size * group_size,
        shuffle=True,
        collate_fn=dataset.collate_fn,
    )
Georgehappy1 commented 2 years ago

@chenming6615 Because in the collate_fn function, we need to sort all the utterances in a batch, which will cost time. If we set group_size > 1, we can sort all the utterances in a big batch, then split the big batch into several small batches for training, which will save time.

chenming6615 commented 2 years ago

@chenming6615 Because in the collate_fn function, we need to sort all the utterances in a batch, which will cost time. If we set group_size > 1, we can sort all the utterances in a big batch, then split the big batch into several small batches for training, which will save time.

Thanks! But why do we need to sort all the utterances in a batch? Will it improve the training speed or accuracy?