hayeong0 / DDDM-VC

Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
https://hayeong0.github.io/DDDM-VC-demo/
160 stars 18 forks source link

你好,似乎找不到数据前处理的代码 #5

Closed wlf0322 closed 4 months ago

wlf0322 commented 5 months ago

Snipaste_2024-03-28_17-57-52

data 里面的类似train_wav_final.txt文件

hayeong0 commented 4 months ago

Hello,

The file list directory structure for training is as follows:

|-- filelist 
|    |-- train_f0.txt
|    |-- train_wav.txt
|    |-- test_f0.txt
|    `-- test_wav.txt

This is an example of each file list.

train_wav.txt
/workspace/raid/dataset/LibriTTS_16k/train-clean-360/100/121669/100_121669_000001_000000.wav
/workspace/raid/dataset/LibriTTS_16k/train-clean-360/100/121669/100_121669_000003_000000.wav
/workspace/raid/dataset/LibriTTS_16k/train-clean-360/100/121669/100_121669_000004_000000.wav

train_f0.txt
/workspace/raid/dataset/LibriTTS_f0_norm/train-clean-360/100/121669/100_121669_000001_000000.pt
/workspace/raid/dataset/LibriTTS_f0_norm/train-clean-360/100/121669/100_121669_000003_000000.pt

Here, wav is the waveform to be trained, and F0 is the normalized value extracted using the YAAPT pitch extractor for each speaker.

Thanks.

wlf0322 commented 4 months ago

Hello,

The file list directory structure for training is as follows:

|-- filelist 
|    |-- train_f0.txt
|    |-- train_�wav.txt
|    |-- test_f0.txt
|    `-- test_wav.txt

This is an example of each file list.

train_wav.txt
/workspace/raid/dataset/LibriTTS_16k/train-clean-360/100/121669/100_121669_000001_000000.wav
/workspace/raid/dataset/LibriTTS_16k/train-clean-360/100/121669/100_121669_000003_000000.wav
/workspace/raid/dataset/LibriTTS_16k/train-clean-360/100/121669/100_121669_000004_000000.wav

train_f0.txt
/workspace/raid/dataset/LibriTTS_f0_norm/train-clean-360/100/121669/100_121669_000001_000000.pt
/workspace/raid/dataset/LibriTTS_f0_norm/train-clean-360/100/121669/100_121669_000003_000000.pt

Here, wav is the waveform to be trained, and F0 is the normalized value extracted using the YAAPT pitch extractor for each speaker.

Thanks.

Thank you very much.