jishengpeng / ControlSpeech

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
195 stars 7 forks source link

请问*.style.pt是什么 #5

Open Darcy0218 opened 3 months ago

Darcy0218 commented 3 months ago

FileNotFoundError: [Errno 2] No such file or directory: '/data4/wuyikai/data/TextrolSpeech/LibriTTS/LibriTTS/train-clean-360/6300/39660/6300_39660_000014_000000.style.pt'

Darcy0218 commented 3 months ago

I can't make sense of it. I see from your codebase that a style embedding is required, which is loaded from.style.pt format. I guess, have you utilized a model to generate style embedding(.style.pt) before training? If so, what is that model? image

jishengpeng commented 3 months ago

I can't make sense of it. I see from your codebase that a style embedding is required, which is loaded from.style.pt format. I guess, have you utilized a model to generate style embedding(.style.pt) before training? If so, what is that model? image

ok, I took a closer look (this baseline is the second act to help run), this should be style emdedding(bert extracted the style representation). I think you have two ways to deal with it, one is to pre-extract relevant representations according to its original logic and save them, the other is to directly load the bert pre-trained model in the training process. References can be shown below. 1ed786fe89bd3c7ab34ffdea24db593

In addition, having to work overtime on weekends is frustrating.

Darcy0218 commented 3 months ago

I can't make sense of it. I see from your codebase that a style embedding is required, which is loaded from.style.pt format. I guess, have you utilized a model to generate style embedding(.style.pt) before training? If so, what is that model? image

ok, I took a closer look (this baseline is the second act to help run), this should be style emdedding(bert extracted the style representation). I think you have two ways to deal with it, one is to pre-extract relevant representations according to its original logic and save them, the other is to directly load the bert pre-trained model in the training process. References can be shown below. 1ed786fe89bd3c7ab34ffdea24db593

In addition, having to work overtime on weekends is frustrating.

哈哈哈,辛苦了兄弟,谢谢你的回复。

xiezheng-cs commented 1 month ago

I can't make sense of it. I see from your codebase that a style embedding is required, which is loaded from.style.pt format. I guess, have you utilized a model to generate style embedding(.style.pt) before training? If so, what is that model? image

ok, I took a closer look (this baseline is the second act to help run), this should be style emdedding(bert extracted the style representation). I think you have two ways to deal with it, one is to pre-extract relevant representations according to its original logic and save them, the other is to directly load the bert pre-trained model in the training process. References can be shown below. 1ed786fe89bd3c7ab34ffdea24db593

In addition, having to work overtime on weekends is frustrating.

您好,请问是直接用inference.py中的get_style_embed函数得到style_embed吗?style_prompt文本是否有什么前处理?