Open Cortexelus opened 3 years ago
this is definitely helpful, and narrows down the genre search
however, this is not the list of artist/genre labels from the training data.
for example, the text file lists tony martin: ['deep adult standards']
and while tony_martin
is in v2_artists, neither deep
adult
nor standards
are in v2_genres
Seems like this textfile is only an approximation, possibly based a scrape from here or here
It's possible these genre tags were the origin of the training data labels; but the training data further filters them out. For example, some genres would be unknown_genre. And in 5b the genre names are first split by space/punctuation into words, and not all words are labels.
The data was scraped from spotify. It is meant as aid for finding nice settings, and as you note the genres do not always agree with the training data labels. Here is the original: https://github.com/deeplearn-art/jukebox/blob/master/artist_genres.txt
Certain genres work better with certain artists (and sometimes the best genre is "unknown"). I hypothesize we're getting better results when we generate using the "correct" artist/genre pairs from the training data. Knowing what these pairs are would save us weeks of time testing!