Closed clairerity closed 2 years ago
Hi, thanks a lot. I am glad you like it 🙂
If I understand you correctly, you would like to reconstruct a Mel spectrogram you obtained from wav file using librosa.
However, the demo (in this cell [is the output of the cell is what you want?]) also extracts mel spectrogram using librosa from the raw audio:
https://github.com/v-iashin/SpecVQGAN/blob/eee222d8351df9b6314db69185d5ce8ca55b50c8/feature_extraction/demo_utils.py#L348-L353
that calls get_spectrogram()
that has the implementation:
https://github.com/v-iashin/SpecVQGAN/blob/eee222d8351df9b6314db69185d5ce8ca55b50c8/feature_extraction/extract_mel_spectrogram.py#L166-L187
and here are the transforms you need to apply in order to convert the sound samples to Mel spectrogram https://github.com/v-iashin/SpecVQGAN/blob/eee222d8351df9b6314db69185d5ce8ca55b50c8/feature_extraction/extract_mel_spectrogram.py#L141-L151
Just make sure your mel spectrogram is extracted with the same parameters and you apply the same transforms (log, calling etc, see TRANSFORMS(x)
).
Also, check if the Neural Audio Codec colab demo makes it any clearer
Hello thank you very much for these! will check them out! :D
Hello! first of all, thanks for this wonderful repo. I would just like to ask as how to reconstruct the mel spectrogram i generated from librosa? I can do this via VQGAN using this code:
the xrec is the reconstructed image (from VQGAN)
I also add a preprocessing step before reconstructing using this code (same one from DALL-E's VQVAE):
in the end i just call these 2 functions to reconstruct the image
I was wondering how I could use your model instead to reconstruct in a way that it is similar to this. I just checked the demo and saw that it extracts audio from the video. I'm thinking as to how I can directly reconstruct the mel spectrogram generated on librosa.
Thank you very much in advance :D