yl4579 / AuxiliaryASR

Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)
MIT License
111 stars 30 forks source link

why mel_spectrogam feature extracting using only MEL_PARAMS here? #6

Closed superhg closed 2 years ago

superhg commented 2 years ago

Hi, why mel_spectrogam feature extracting using only MEL_PARAMS? why SPECT_PARAMS not used?

https://github.com/yl4579/AuxiliaryASR/blob/7bca68a111545a3e92d7e7f88a0639fef87ce7ea/meldataset.py#L50

superhg commented 2 years ago

another question, can I train data with 16000 sampling rate just modifying SPECT_PARAMS ?

yl4579 commented 2 years ago

Because it was a mistake from the very beginning. See https://github.com/yl4579/StarGANv2-VC/issues/10. I have fixed that part, but I don't think it'll make too much difference. The only thing you have to work out is the vocoder.

yl4579 commented 2 years ago

As for the 16k Hz sampling rate, I think it should work, as long as your application (VC or TTS) also uses 16k Hz and your vocoder is trained with the same setting.