Release Emilia and Emilia-Pipe

yuantuo666 commented 2 weeks ago

✨ Description

We release the Emilia, an extensive, multilingual, and diverse dataset, and Emilia-Pipe, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation.

Major contribution for this PR: @HarryHe11 @shangqwe123 @yuantuo666 @lixuyuan102

🚧 Related Issues

None

👨‍💻 Changes Proposed

[x] Update README.md News
[x] Add a README.md in preprocessors/Emilia directory to introduce Emilia-Pipe
[x] Integrate processing pipeline Emilia-Pipe under preprocessors/Emilia directory

🧑‍🤝‍🧑 Who Can Review?

@HarryHe11 @jiaqili3 @RMSnow @HeCheng0625

🛠 TODO

None

✅ Checklist

[x] Code has been reviewed
[x] Code complies with the project's code standards and best practices
[x] Code has passed all tests
[x] Code does not affect the normal use of existing features
[x] Code has been commented properly
[x] Documentation has been updated (if applicable)
[x] Demo/checkpoint has been attached (if applicable)

HarryHe11 commented 2 weeks ago

Hi Chaoren @yuantuo666 , thank you so much for raising this PR and all of your efforts @yuantuo666 @shangqwe123 @lixuyuan102!

I notice that we haven't provide the list for the source audios yet in this pr. Maybe we could complete this part before merging!

HarryHe11 commented 2 weeks ago

@RMSnow Hi Xueyao, thank you so much for your helpful suggestions! I have addressed all of your comments accordingly!

HarryHe11 commented 1 week ago

@yuantuo666 @shangqwe123 @lixuyuan102 Chaoren, Zengqiang, Xuyuan,

I think we'd better merge this pr after we got our arxiv link, and done all of the following tasks.

[x] Codes are up to date (all tasks are completed)
[x] Codes are tested.
[x] Demopage is up to date (the abstract is consistent with our arxiv paper).
[x] Licenses and copyright declarations have been appropriately added in Github, huggingface.
[x] Huggingface releases the meta-information for the source audios.
[x] Amphion's main README relayed the news of Emilia's release and provides a link to the sub-repo.
[x] Emilia's README introduces the dataset adequately.
[x] Emilia's README provides appropriate URLs to the demopage. 2. Hugging Face Page. 3. Arxiv paper
[x] Emilia's README, Demopage, and huggingface display references to Emilia's arxiv work and amphion.

open-mmlab / Amphion