winddori2002 / TriAAN-VC

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
MIT License
129 stars 12 forks source link

Model output size #14

Closed Shmuel-Gruel closed 1 year ago

Shmuel-Gruel commented 1 year ago

Hello again, Can you help me understand the shape of the output from the model? It looks like the output is (feature_length, mel_bands). Is that correct? It appears the CPC feature length is not the same as original melspectrogram length. Does that affect the length of the converted audio? Thank you for your help :)

winddori2002 commented 1 year ago

Hi,

The shape of the output (mel-spectrogram) is correct you mentioned. The length difference between CPC and mel-spectrogram is about 1. It does not affect the performance a lot, especially if we match the resolution between each feature.

Thanks