Closed JeffC0628 closed 3 years ago
how did you get the number of acoustic frames?
Because the sampling rate is different. The original repo uses a default sampling rate of 22.050 kHz but you are using 16 kHz. I guess that is the reason why the number of acoustic frames is different.
How did you get the acosutic_frame_number (e.g. 135, 145, 365, 103, 57.... ), I ues the extract_mel_spec function in _extractfeatures.py, but got the different numbers of frame, however the phone_number is the same. here is my result:
and my param is y, sample_rate = librosa.load(filename, sr=16000) spec = librosa.core.stft(y=y,n_fft=2048,hop_length=200, win_length=800,window='hann',center=True,pad_mode='reflect') spec = librosa.magphase(spec)[0] log_spectrogram = np.log(spec).astype(np.float32) mel_spectrogram = librosa.feature.melspectrogram(S=spec, sr=sample_rate,n_mels=80,power=1.0, fmin=0.0, fmax=None, htk=False, norm=1) log_mel_spectrogram = np.log(mel_spectrogram).astype(np.float32)
_Originally posted by @Alphadone in https://github.com/jxzhanggg/nonparaSeq2seqVC_code/issues/2#issuecomment-741600771_