rosinality / vq-vae-2-pytorch

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
Other
1.6k stars 270 forks source link

Cannot reconstruct when use mel spectrum as data #21

Open LiNaihan opened 4 years ago

LiNaihan commented 4 years ago

I leveraged the code and setting, with the only change that I employed conv1d to process mel spectrums, which can be considered as 1-d data with 80 channels. However, I found the reconstruction is quite poor, converging to a large loss. Is there any guess for the reason or suggestion for debugging? Thanks a lot!

rosinality commented 4 years ago

Sorry for late reply. I haven't tried this model on audio domain, but I suspect that data normalization and preprocessings are crucial for log melspectrogram as this model doesn't have particular normalization layers.