maum-ai / assem-vc

Official Code for Assem-VC @ICASSP2022
https://mindslab-ai.github.io/assem-vc/
BSD 3-Clause "New" or "Revised" License
265 stars 39 forks source link

How to split singing voices #36

Open betty97 opened 2 years ago

betty97 commented 2 years ago

Hi, I am trying to reproduce the results presented in the paper "Controllable and Interpretable Singing Voice Decomposition via Assem-VC", with the CSD, NUS-48E and also with custom datasets. In the paper it is said that "all singing voices are split between 1-12 seconds and used for training with corresponding lyrics". I understand that the original .wav files of the datasets need to be splitted to shorter .wav files before building the metadata files with format "path_to_wav|transcription|speaker_id". However, I can't find any code in the repository for doing this. How is this splitting process done? Is it done manually for all the datasets?

Thanks!