How to convert text to target audio（TTS） using ns3_codec（naturalspeech3）

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

https://openhlt.github.io/amphion/

MIT License

4.41k stars 373 forks source link

How to convert text to target audio（TTS） using ns3_codec（naturalspeech3） #166

Closed aguang1201 closed 4 months ago

aguang1201 commented 5 months ago

Thanks for the excellent work, noticed that ns3_codec can blend two audio perfectly together, but if I have a text and an audio, how do I get the audio to read out the text content with ns3_codec?

HeCheng0625 commented 5 months ago

Hi, since FACodec is not a ASR model, it can only get content representation from content codes, you can check https://github.com/open-mmlab/Amphion/blob/main/models/codec/ns3_codec/README.md

aguang1201 commented 5 months ago

@HeCheng0625 Thanks,get it, FACodec is not a TTS model, please let me rephrase my question, how does FACodec work with a TTS model?

HeCheng0625 commented 5 months ago

Hi, you can follow the FACodec and NaturalSpeech 3 paper: https://arxiv.org/abs/2403.03100; we are working on reproducing NS3.

LearnMF commented 5 months ago

How to use naturalspeech3 for tts ; Is there any demo python code ?

HeCheng0625 commented 4 months ago

Hi, currently NS3 is not open source due to commercial reasons, and we are still working hard to reproduce it.

yuantuo666 commented 4 months ago

Hi @aguang1201, if you have any further questions, feel free to re-open this issue. We are glad to follow up!