open-mmlab / Amphion

Amphion (/Γ¦mˈfaΙͺΙ™n/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
7.3k stars 542 forks source link

Add preprocessing scripts for the librilight datasets #107

Closed HarryHe11 closed 9 months ago

HarryHe11 commented 9 months ago

✨ Description

This update introduces preprocessing scripts for the Libri-Light datasets, enhancing their usability and compatibility with our processing workflows.

🚧 Related Issues

No related issues.

πŸ‘¨β€πŸ’» Changes Proposed

πŸ§‘β€πŸ€β€πŸ§‘ Who Can Review?

πŸ›  TODO

βœ… Checklist

HarryHe11 commented 9 months ago

In this comment, I provide screenshots from testing the implemented scripts on Libri-Light-tiny (a custom split from Libri-Light-small).

The Running Process

preprocessors/librilight.py

Part 1 of 5:

1321705224117_ pic

Part 2 of 5:

1331705224117_ pic

Part 3 of 5:

1341705224118_ pic

Part 4 of 5:

1351705224119_ pic

Part 5 of 5:

1361705224120_ pic

The Outcome:

Processed Data:

1381705225355_ pic

MetaData:

1371705225354_ pic

HarryHe11 commented 9 months ago

Thanks for your efforts. Please check out the comments

Thank you so much for reading my PR; I have addressed your concerns, and please see my most recent commits.

lmxue commented 9 months ago

LGTM. P.S. This PR has been tested on Libri-Light-tiny. However, three other subdatasets need to be tested as listed in TODO. You may need to test them when the dataset is ready.

HarryHe11 commented 9 months ago

LGTM.

P.S. This PR has been tested on Libri-Light-tiny. However, three other subdatasets need to be tested as listed in TODO. You may need to test them when the dataset is ready.

sure, I test them then.