open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.41k stars 373 forks source link

Add sgmse implementation #177

Open lithr1 opened 4 months ago

lithr1 commented 4 months ago

You can see the egs/sgmse/README.md. This task is for the final project for AIR6063: Spoken Language Processing. My name is 李思睿(lisirui) 223040027.

✨ Description

Add sgmse implementation

👨‍💻 Changes Proposed

🧑‍🤝‍🧑 Who Can Review?

@HeCheng0625 @Adorable-Qin

✅ Checklist

HarryHe11 commented 4 months ago

@Adorable-Qin , Hi Zihao, could you help review this pr about speech enhancement?

HarryHe11 commented 4 months ago

You can see the egs/sgmse/README.md.This task is for the final project for AIR6063: Spoken Language Processing.My name is 李思睿(lisirui) 223040027.

Hi Sirui, Thank you so much for your helpful contribution! Could you provide us with some samples and also checkpoints that could showcase the effectiveness of your model?

lithr1 commented 4 months ago

You can see the egs/sgmse/README.md.This task is for the final project for AIR6063: Spoken Language Processing.My name is 李思睿(lisirui) 223040027.

Hi Sirui, Thank you so much for your helpful contribution! Could you provide us with some samples and also checkpoints that could showcase the effectiveness of your model?

My model's checkpoint has only been trained halfway compared to the model in the source file(it needs 400000steps), but I lack computing resources (the source file used eight cards). I only used one card and have been training 200000steps for five days, but it has been proven to be trainable. For detailed samples, please refer to the link https://yiufjt4rn74.feishu.cn/docx/FOK8dfW9mo7AhyxJMDecQiFen7O?from=from_copylink

yuantuo666 commented 4 months ago

You can see the egs/sgmse/README.md.This task is for the final project for AIR6063: Spoken Language Processing.My name is 李思睿(lisirui) 223040027.

Hi Sirui, Thank you so much for your helpful contribution! Could you provide us with some samples and also checkpoints that could showcase the effectiveness of your model?

My model's checkpoint has only been trained halfway compared to the model in the source file(it needs 400000steps), but I lack computing resources (the source file used eight cards). I only used one card and have been training 200000steps for five days, but it has been proven to be trainable. For detailed samples, please refer to the link https://yiufjt4rn74.feishu.cn/docx/FOK8dfW9mo7AhyxJMDecQiFen7O?from=from_copylink

Hi Sirui, the access right to the Feishu docs is not configured appropriately, could you provide one link with public access rights so we can check the details on the samples?

image
yuantuo666 commented 4 months ago

To help us manage the PRs, I have attached a checklist on the first message. Feel free to add more specific information, like examples and changes to help us understand your contribution. Thanks!

lithr1 commented 4 months ago

You can see the egs/sgmse/README.md.This task is for the final project for AIR6063: Spoken Language Processing.My name is 李思睿(lisirui) 223040027.

Hi Sirui, Thank you so much for your helpful contribution! Could you provide us with some samples and also checkpoints that could showcase the effectiveness of your model?

My model's checkpoint has only been trained halfway compared to the model in the source file(it needs 400000steps), but I lack computing resources (the source file used eight cards). I only used one card and have been training 200000steps for five days, but it has been proven to be trainable. For detailed samples, please refer to the link https://yiufjt4rn74.feishu.cn/docx/FOK8dfW9mo7AhyxJMDecQiFen7O?from=from_copylink

Hi Sirui, the access right to the Feishu docs is not configured appropriately, could you provide one link with public access rights so we can check the details on the samples? image

I have opened the access right,you can open the link again.

Adorable-Qin commented 3 months ago

Hi @lithr1 !

Thank you for your efforts to improve Amphion.

However, the samples you attached do not sound as good as expected from the paper you are trying to reproduce. As what you said that your model may lack training or you need to scale up the dataset used during training, I recommend not to merge this PR until you can get a reasonable result. Then we could consider merging.