Relevant Questions - Githubissues

taha-ismet commented 4 weeks ago

1) What are some good song similarity metrics? 1-a) For songs $s_1$ and $s_2$ in a platform, for $i=1,2$, let $S_i$ be the set of playlists that contain $s_i$. Is Jaccard Index of $S_1$ and $S_2$, defined as $J(S_1,S_2) = \frac{|S_1\cap S_2|}{|S_1\cup S_2|}$, a good measure for similarity of songs $s_1$ and $s_2$? Or is there another set similarity measure $d$, so that $d(S_1,S_2)$ is a meaningful similarity measure of $s_1$ and $s_2$?

 **1-b)** What is the connectivity between genres and mood descriptors ? Can we use one as a subpredictor of other? Or in both way in different circumstances ? In other words how much similar the songs lies in the intersection of a genre $G_i$ and a mood descriptor $M_j$. If these intersections provide patterns in different circumstances, maybe we can use one as a subpredictor of other.

2) Is being on the same playlist and having close values in song parameters from Spottify API go hand in hand?

3) Do we need to classify subgenres to assign mood descriptors ? If we do and I guess we do, it is a really hard task as said by different academics. Because time matters here. For example there are so many subgenres under the genres Electronic Dance Music (EDM) and Heavy Metal. See this wiki. Some genres evolve fast in time. Maybe we can create a metric to denote the velocity of evolution of a genre and weight songs with this.

mf-caglar commented 3 weeks ago

Let's use comments for discussion and update the issue as new questions appeared.
TL;DR How to approach song similarity? I guess we try to predict an unseen song "captured from the prompt using our predictors" using "similar" songs in our database. There are different metrics with different limitations. The concern might be selecting best approach considering our "content type" and "recommendation approach". Content type is certain. If we don't have any chance to get users data, we'll use content-based filtering which needs some sort of past user data. QUESTION 1 I looked at recommendation types a little bit. There are 4 types of recommendation systems:
Colloborative Recommendation
Content-based Recommendation
Knowledge-based Recommendation
Hybrid Approaches

Probably we don't have to chance to use any other approach except content-based since we only have content data. And the similarity measures you discussed in Question 1 is a part of this approach. I see that there are different metrics and Jaccard Index is one of them. There are also:

Dice coefficient:
- If every book $B_i$ is described by a set of keywords $\text{keywords}(B_i)$, the Dice coefficient measures the similarity between books $b_i$ and $b_j$ as follows: $\frac{2 \times | \text{keywords}(b_i) \cap \text{keywords}(b_j) |}{| \text{keywords}(b_i) | + | \text{keywords}(b_j) |}$
The vector space model and TF-IDF:
- If I understood correctly, this is fundamentally about representing the content itself with n-dimensional vectors where $x_i$ is 1 or 0 where n is the number of unique values and $x_i$ is whether or not $x_i$ is equal to $n_i$. This is weak in it's nature but TF-IDF(term frequency-inverse document frequency) improves it. There are also different penalizing approaches. Maybe we can try this using lyrics or downsampled audio signals but I am totally not sure. There are also
Similarity-based retrieval:
- Here "similarity" refers to "recommend items that are similar to those the user liked in the past". One approach here is KNN. The other is Relevance feedback – Rocchio’s method which is developed as part of Vector Space Model. Here a query is taken from user and this represents a vector in space. Recommender retrieves a content and user marks it as relevant or irrelevant. Based on this relevance feedback, Rocchio’s method modifies the original query by pulling it closer to the relevant songs $S+$ and pushing it away from irrelevant ones $S-$. $Q{\text{new}} = \alpha \cdot Q{\text{original}} + \beta \cdot \frac{1}{|DR|} \sum{d_j \in D_R} d_j - \gamma \cdot \frac{1}{|DN|} \sum{d_k \in D_N} d_k]$

Where:

$Q_{\text{new}}$: The updated query vector.
$Q_{\text{original}}$: The original query vector submitted by the user.
$D_R$: The set of $\textbf{relevant}$ documents.
$D_N$: The set of $\textbf{irrelevant}$ documents.
$d_j$: Vector representation of a relevant document.
$d_k$: Vector representation of an irrelevant document.
$\alpha, \beta, \gamma$: Weighting parameters that control the contribution of the original query, the relevant documents, and the irrelevant documents to the updated query.
I got these from the book Recommender Systems: An Introduction by Dietmar Jannach et. all. I liked the book. There are also case studies in it. Maybe we can utilize it completely. İt handles the topics by discussion on specific examples.
Another one is Machine Learning for Data Science Handbook. There are some articles on multimedia data learning and recommender systems. Here is a quote(pg.651):

The choice of the methods used for content representation learning is highly dependent on the content format. Early works used tf*idf, decision trees, and linear classifiers to model the content [25]. Nowadays, with the development of neural networks, it is a more common choice to handle the content with deep neural networks. If we have a table of categorical features, using aforementioned methods such as DeepFM and Wide&Deep learning model would be a good fit. Other neural architectures such as convolution neural networks, autoencoder, and transformer [27] can be used for more complex features. Specifically, these methods are especially effective for multimedia data sources such as text [29], image, audio, and even video. For example, convolutional neural networks with flexible convolution and pooling operations are effective in capturing the spatial and temporal dependencies in images and texts. A number of well-defined CNN architectures such as GoogleNet and ResNet [17, 10] are ready for use. Readers are referred to the survey [11] for more detailed descriptions of deep learning solutions for these tasks.

A paper mathematically describing a large set of audio features for sound description used for similarity and description

mf-caglar commented 3 weeks ago

Question 3

Using the code here we can predict genre of a any electronic dance music from its wav file. It takes genre classification in Beatport as true values. The point is that the model here is trained on 2016 November data from Beatport on 2300 songs and it's performance is analyzed on 2018 February data. Despite there are some changes, the model performance was reasonably fair. After tried the current model, we can retrain it on new data. Check the study here: Automatic subgenre classification in an electronic dance music taxonomy.pdf

taha-ismet commented 2 weeks ago

To asess question 2, I made a notebook: https://colab.research.google.com/drive/1oHT75nqfXJFCNNGlzexFNaGmq8y61ZSr?usp=sharing

mf-caglar / song_analysis_project

Relevant Questions #5