Closed yuantuo666 closed 1 week ago
Hi Chaoren @yuantuo666 , thank you so much for raising this PR and all of your efforts @yuantuo666 @shangqwe123 @lixuyuan102!
I notice that we haven't provide the list for the source audios yet in this pr. Maybe we could complete this part before merging!
@RMSnow Hi Xueyao, thank you so much for your helpful suggestions! I have addressed all of your comments accordingly!
@yuantuo666 @shangqwe123 @lixuyuan102 Chaoren, Zengqiang, Xuyuan,
I think we'd better merge this pr after we got our arxiv link, and done all of the following tasks.
✨ Description
We release the Emilia, an extensive, multilingual, and diverse dataset, and Emilia-Pipe, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation.
Major contribution for this PR: @HarryHe11 @shangqwe123 @yuantuo666 @lixuyuan102
🚧 Related Issues
None
👨💻 Changes Proposed
README.md
NewsREADME.md
inpreprocessors/Emilia
directory to introduce Emilia-Pipepreprocessors/Emilia
directory🧑🤝🧑 Who Can Review?
@HarryHe11 @jiaqili3 @RMSnow @HeCheng0625
🛠 TODO
None
✅ Checklist