jxzhanggg / nonparaSeq2seqVC_code

Implementation code of non-parallel sequence-to-sequence VC
MIT License
248 stars 56 forks source link

> They are my prepared training list. #43

Closed JeffC0628 closed 3 years ago

JeffC0628 commented 3 years ago

They are my prepared training list. each line looks like this: spectrogram_path acosutic_frame_number phone_number for example: /home/jxzhang/Documents/DataSets/VCTK/spec/p225/log-spec-p225_090.npy 135 22 /home/jxzhang/Documents/DataSets/VCTK/spec/p225/log-spec-p225_118.npy 145 23 /home/jxzhang/Documents/DataSets/VCTK/spec/p225/log-spec-p225_014.npy 365 52 /home/jxzhang/Documents/DataSets/VCTK/spec/p225/log-spec-p225_179.npy 103 11 /home/jxzhang/Documents/DataSets/VCTK/spec/p225/log-spec-p225_309.npy 57 7 /home/jxzhang/Documents/DataSets/VCTK/spec/p225/log-spec-p225_353.npy 142 24 /home/jxzhang/Documents/DataSets/VCTK/spec/p225/log-spec-p225_012.npy 310 49 ...

How did you get the acosutic_frame_number (e.g. 135, 145, 365, 103, 57.... ), I ues the extract_mel_spec function in _extractfeatures.py, but got the different numbers of frame, however the phone_number is the same. here is my result:

/VCTK/spec/p225/log-spec-p225_090.npy 346 22 /VCTK/spec/p225/log-spec-p225_118.npy 315 23 /VCTK/spec/p225/log-spec-p225_014.npy 533 52 /VCTK/spec/p225/log-spec-p225_179.npy 250 11

and my param is y, sample_rate = librosa.load(filename, sr=16000) spec = librosa.core.stft(y=y,n_fft=2048,hop_length=200, win_length=800,window='hann',center=True,pad_mode='reflect') spec = librosa.magphase(spec)[0] log_spectrogram = np.log(spec).astype(np.float32) mel_spectrogram = librosa.feature.melspectrogram(S=spec, sr=sample_rate,n_mels=80,power=1.0, fmin=0.0, fmax=None, htk=False, norm=1) log_mel_spectrogram = np.log(mel_spectrogram).astype(np.float32)

_Originally posted by @Alphadone in https://github.com/jxzhanggg/nonparaSeq2seqVC_code/issues/2#issuecomment-741600771_

atravler commented 3 years ago

how did you get the number of acoustic frames?

KunZhou9646 commented 3 years ago

Because the sampling rate is different. The original repo uses a default sampling rate of 22.050 kHz but you are using 16 kHz. I guess that is the reason why the number of acoustic frames is different.