ixobert / birds-generation

11 stars 1 forks source link

Question: Conversion from .npy to .wav? #8

Closed haydensflee closed 5 months ago

haydensflee commented 5 months ago

Thanks for the detailed response to my previous questions about the generation process in #3 !

I've been able to successfully run the model using your pre-trained model weights from the .ckpt file you provided. I've got a few .npy spectrogram files now that I wish to listen to. I've tried a few methods to convert them back into .wav files but no it doesn't play through a sound player. Did you use any tools to convert the .npy files into .wav files that we can listen to?

Also after researching about this topic on the internet, does this conversion between spectrogram to audio result in losing any information? Are there assumptions that we have to make when we do the inversion process?

Cheers

ixobert commented 5 months ago

Hi @haydensflee, yes I coded a streamlit app to inspect the models. Just add it to the repo interactive app

#Command to run
streamlit run interactive_app.py

The tool lets you:

I would recommend using wav files of at least 3 seconds, but feel free to break things.

Keep in mind that generate spectrograms are mainly intended for training purpose(with data augmentation), so they might not be pleasing to listen with high volume.

Regarding the lossy conversion spectrogram to audio

Yes, the process used in this paper to convert the spectrogram back to audio is lossy. In our study the conversion is not an issue because the spectrograms that we generate are directly used with CNN classifier - thus, no need to go back to audio.

However, if one needs the audio format instead, they will need to convert the spectrogram back to audio using a method like (Griffin & Lim, 1984) (See section 1.4 of the ECOGEN paper) This paper instead try to convert spectrogram back to audio with good quality using a neural net. They claim better quality and faster processing compared to (Griffin & Lim, 1984) on speech synthesis.