[DMP 2024]: K-means clustering of chroma vectors generated from audio data in wav format

Shashankss1205 commented 5 months ago

This is an Issue raised related to the Issue #82 with a concrete idea based on 3rd approach given. I am Shashank Shekhar Singh, a sophomore at IIT BHU, India having interests in Machine Learning model development and deployment.

Approach Chosen(amongst the 3 mentioned) :

"Recreate our code on a jupyter notebook or google collab notebook We already have some code that takes audio files and converts them into vectors. We also have code that takes these vectors and clusters them. I would take this approach if you are a software engineer with some ML engineering skills and you know your way around using ML models. Once you get this working on your notebook we can try out different pretrained models to evaluate performance."

Link to my first approach: My Colab Notebook

Background: Reference

Future Trial Work on this approach:

Data pre-processing using various techniques, data augmentation and compatibility for all audio formats including wav, mp3 etc..
Support Vector Machines (SVMs): SVMs can be used for genre classification, artist classification, and mood classification. They work by finding a hyperplane that best separates the data points belonging to different classes.
Deep Neural Networks (DNNs): DNNs are a class of artificial neural networks that have multiple hidden layers. They can be used for a wide variety of tasks, including music genre classification, and music recommendation. Convolutional Neural Networks (CNNs) are a specific type of DNN that are well-suited for tasks that involve analyzing sequential data, such as audio.
Hidden Markov Models (HMMs): HMMs are statistical models that can be used to model sequences of events. They can be used for music genre classification, music segmentation (dividing a song into different sections), and music rhythm analysis.

Please provide me with any feedback related to this development so that I can delve deep into this. @dennyabrain @duggalsu

dennyabrain commented 5 months ago

Hi @Shashankss1205, thank you for taking the first step. We will take some time to review it and get back to you.

I'd like to suggest some things to keep in mind. The dataset we will look for analysis would consist of data collected from social network relevant to Indian context. So analysis of musical features and genre classification are not an intended use case.

I would also like to propose two changes to the notebook :

Ideally we'd like to mount our own google drive to your notebook and run it against our dataset. Could you document somewhere what is the expected structure/naming convention your code expects so its easy for us to run your code against our dataset.
While I see that you have implemented some clustering in your notebook, it would be nice if you added some code so we could preview/hear the audio files in a cluster within the notebook itself. This would help us evaluate how good the clustering is for a particular use case.

Shashankss1205 commented 5 months ago

Thank You @dennyabrain for your valuable feedback, I will look into the changes you suggested and update my code.

Shashankss1205 commented 4 months ago

Hello @dennyabrain, sorry I wasn't available because of my end-semester exams. From what I have seen in the comments of the issue, We have to classify the dataset based on the contextual references rather than the tone, music etc.. Therefore I think there's no use in updating my previous notebook which was intended for different use case.

I am creating a new pipeline which takes audio as input, transcribes it into a text, using the voice from the audio, and then to build a clustering algorithm for dividing it according to the different classes, that are intended to be made. If you think this can be the possible way out, I would like to start working on it.

Thanks and regards, Shashank Shekhar Singh

aatmanvaidya commented 3 months ago

closing this issue because the DMP program has started.

tattle-made / feluda

[DMP 2024]: K-means clustering of chroma vectors generated from audio data in wav format #271