openai / jukebox

Code for the paper "Jukebox: A Generative Model for Music"
https://openai.com/blog/jukebox/
Other
7.83k stars 1.42k forks source link

What are the "correct" artist/genre pairs from the training data? #235

Open Cortexelus opened 3 years ago

Cortexelus commented 3 years ago

Certain genres work better with certain artists (and sometimes the best genre is "unknown"). I hypothesize we're getting better results when we generate using the "correct" artist/genre pairs from the training data. Knowing what these pairs are would save us weeks of time testing!

Gfdshui commented 3 years ago

https://paste.in/igiAEJ

Cortexelus commented 3 years ago

this is definitely helpful, and narrows down the genre search

however, this is not the list of artist/genre labels from the training data.

for example, the text file lists tony martin: ['deep adult standards'] and while tony_martin is in v2_artists, neither deep adult nor standards are in v2_genres

Seems like this textfile is only an approximation, possibly based a scrape from here or here

It's possible these genre tags were the origin of the training data labels; but the training data further filters them out. For example, some genres would be unknown_genre. And in 5b the genre names are first split by space/punctuation into words, and not all words are labels.

deeplearn-art commented 3 years ago

The data was scraped from spotify. It is meant as aid for finding nice settings, and as you note the genres do not always agree with the training data labels. Here is the original: https://github.com/deeplearn-art/jukebox/blob/master/artist_genres.txt