zceng / LVCNet

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Apache License 2.0
79 stars 16 forks source link

why -4 #2

Closed hdmjdp closed 3 years ago

hdmjdp commented 3 years ago

https://github.com/ZENGZHEN-TTS/LVCNet/blob/6e7748e7ef358f7d95b8ce08bb682e7993f9c639/vocoder/models/lvcnet.py#L76

https://github.com/ZENGZHEN-TTS/LVCNet/blob/6e7748e7ef358f7d95b8ce08bb682e7993f9c639/vocoder/models/lvcnet.py#L45

if I pad 2

zceng commented 3 years ago

Because of the length correlation between input waveforms and mel-spectrum.

https://github.com/ZENGZHEN-TTS/LVCNet/blob/e81e13f3479a4d85f498a02e42338ebe823a8b3d/vocoder/datasets/audio_mel.py#L59-L65

As shown in above, the length of the input waveform (audio) is equal to that the length of mel-spectrum minus 4 and multiply by the hop_length.

zceng commented 3 years ago

Similar process can find in Parallel WaveGAN.

https://github.com/kan-bayashi/ParallelWaveGAN/blob/53d14969089b3d3229fe6bfce221234f25a9d836/parallel_wavegan/bin/train.py#L548-585

hdmjdp commented 3 years ago

ok. In process data, I did not minus 4. So in my version, I think no need to "cond_length - 4 ) ".