Closed youssefavx closed 3 years ago
Hi there, and thanks for using Surfboard!
You can find more information about the MFCCs computed with Surfboard here. It does not mean that the wave audio file is split into 13 parts. The tutorial linked is a great way to understand what the MFCCs mean.
Your project sounds very interesting! Surfboard allows you to extract either one vector per audio file, or feature arrays (which are sequences of feature vectors, each feature vector computed on a slice of audio - these are the _slidingwindow
components) if you use the compute-components
functionality. You would have to read into the different components offered by Surfboard to see their relevance to music.
I hope this helps -- feel free to ask any follow-up questions.
Thank you so much! This is so exciting. I wasn't even thinking about different components, but I wonder how that might influence the sound. I will let you know if I have any further questions, thanks again!
No worries, and best of luck!
Hey guys, thanks for making this and increasing accessibility for everyone!
You say in your readme:
"This config will compute the mean and standard deviation of every MFCC (13 by default but set to 26 here) and log mel-spectrogram filterbank (128 by default but 64 here) on every .wav file in my_wav_folder if called with the following command"
My first question is: What does every MFCC mean? Does this mean that given a wave audio file, it will split it into 13 parts, then compute features for every part?
Second question: I want to use this to basically do something like word2vec but for music. I want to download sounds, or musical elements of different lengths, and basically extract vectors for how similar something sounds.
I want to do this to create a kind of collage. Given a piece of music, I want to split it up into little pieces, and reconstruct it with different sounds that are actually different but are similar in terms of elements like most dominant notes in the audio, and other audio qualities.
Do you think your library might come in handy in such a project?
Basically I'm unsure if feature extraction == sound2vec in this case.