[Feature]: MaskGCT long-form audio

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

https://openhlt.github.io/amphion/

MIT License

7.7k stars 584 forks source link

[Feature]: MaskGCT long-form audio #290

Open fakerybakery opened 1 month ago

fakerybakery commented 1 month ago

Hi, Thanks for releasing MaskGCT! Are there any plans to support long-form speech synthesis besides using chunking? Thanks!

HeCheng0625 commented 4 weeks ago

Hi, thank you for your attention. In the future, we will expand the training data to the minute level and use a codec with a higher compression rate to generate longer audio.

JonathanFly commented 3 weeks ago

Hi, thank you for your attention. In the future, we will expand the training data to the minute level and use a codec with a higher compression rate to generate longer audio.

In the current version, what was the duration of the training data "chunk"?