mf-caglar / song_analysis_project

0 stars 0 forks source link

Relevant Questions #5

Open taha-ismet opened 4 weeks ago

taha-ismet commented 4 weeks ago

1) What are some good song similarity metrics? 1-a) For songs $s_1$ and $s_2$ in a platform, for $i=1,2$, let $S_i$ be the set of playlists that contain $s_i$. Is Jaccard Index of $S_1$ and $S_2$, defined as $J(S_1,S_2) = \frac{|S_1\cap S_2|}{|S_1\cup S_2|}$, a good measure for similarity of songs $s_1$ and $s_2$? Or is there another set similarity measure $d$, so that $d(S_1,S_2)$ is a meaningful similarity measure of $s_1$ and $s_2$?

 **1-b)** What is the connectivity between genres and mood descriptors ? Can we use one as a subpredictor of other? Or in both way in different circumstances ? In other words how much similar the songs lies in the intersection of a genre $G_i$ and a mood descriptor $M_j$. If these intersections provide patterns in different circumstances, maybe we can use one as a subpredictor of other.

2) Is being on the same playlist and having close values in song parameters from Spottify API go hand in hand?

3) Do we need to classify subgenres to assign mood descriptors ? If we do and I guess we do, it is a really hard task as said by different academics. Because time matters here. For example there are so many subgenres under the genres Electronic Dance Music (EDM) and Heavy Metal. See this wiki. Some genres evolve fast in time. Maybe we can create a metric to denote the velocity of evolution of a genre and weight songs with this.

mf-caglar commented 3 weeks ago

Probably we don't have to chance to use any other approach except content-based since we only have content data. And the similarity measures you discussed in Question 1 is a part of this approach. I see that there are different metrics and Jaccard Index is one of them. There are also:

Where:

The choice of the methods used for content representation learning is highly dependent on the content format. Early works used tf*idf, decision trees, and linear classifiers to model the content [25]. Nowadays, with the development of neural networks, it is a more common choice to handle the content with deep neural networks. If we have a table of categorical features, using aforementioned methods such as DeepFM and Wide&Deep learning model would be a good fit. Other neural architectures such as convolution neural networks, autoencoder, and transformer [27] can be used for more complex features. Specifically, these methods are especially effective for multimedia data sources such as text [29], image, audio, and even video. For example, convolutional neural networks with flexible convolution and pooling operations are effective in capturing the spatial and temporal dependencies in images and texts. A number of well-defined CNN architectures such as GoogleNet and ResNet [17, 10] are ready for use. Readers are referred to the survey [11] for more detailed descriptions of deep learning solutions for these tasks.

mf-caglar commented 3 weeks ago

Question 3

taha-ismet commented 2 weeks ago

To asess question 2, I made a notebook: https://colab.research.google.com/drive/1oHT75nqfXJFCNNGlzexFNaGmq8y61ZSr?usp=sharing