Closed chadbrewbaker closed 6 months ago
Would it be sane to get your model to support text to audio clips like this?
One of the DALLE3 engineers has a personal project called Tortise-TTS where he has a voice version of CLIP he calls CLVP.
https://github.com/neonbjb/tortoise-tts/blob/1e061bc6752f05bccb59748c8bd7c7fc85d54988/tortoise/models/clvp.py#L24
I think he used lucidrains CLIP as a template: https://github.com/lucidrains/DALLE-pytorch/blob/58c1e1a4fef10725a79bd45cdb5581c03e3e59e7/dalle_pytorch/dalle_pytorch.py#L272
@VoVoR and @kimihailv, what do you think about this?
Hello. It is an interesting suggestion. However, it is not our priority for now
Would it be sane to get your model to support text to audio clips like this?
One of the DALLE3 engineers has a personal project called Tortise-TTS where he has a voice version of CLIP he calls CLVP.
https://github.com/neonbjb/tortoise-tts/blob/1e061bc6752f05bccb59748c8bd7c7fc85d54988/tortoise/models/clvp.py#L24
I think he used lucidrains CLIP as a template: https://github.com/lucidrains/DALLE-pytorch/blob/58c1e1a4fef10725a79bd45cdb5581c03e3e59e7/dalle_pytorch/dalle_pytorch.py#L272