Open LiNaihan opened 4 years ago
Sorry for late reply. I haven't tried this model on audio domain, but I suspect that data normalization and preprocessings are crucial for log melspectrogram as this model doesn't have particular normalization layers.
I leveraged the code and setting, with the only change that I employed conv1d to process mel spectrums, which can be considered as 1-d data with 80 channels. However, I found the reconstruction is quite poor, converging to a large loss. Is there any guess for the reason or suggestion for debugging? Thanks a lot!