rit-ai / serenade

Let's see if we can do fun things with music
7 stars 5 forks source link

Look at Josh's Dataset #10

Closed dogwaterdev1 closed 7 years ago

DBowald commented 7 years ago
dogwaterdev1 commented 7 years ago

Apparently, while GTZAN is seen as a default benchmark for a lot MGR, it also has some serious faults we should talk about and take into account as well.

But as a precursor to that, GTZAN format is __, I haven't gotten the chance to download it just yet. (It's a ~1.2GB, There's also offerings for music speech at ~300mb). https://marsyasweb.appspot.com/download/data_sets/ P.S. There's also a contact link at the bottom of the page for gtzan@cs.uvic.ca

They're represented with 1000 audio tracks, each of which are 30 seconds long in mono 16-bit .wav format. There are 10 genres in the 1000 track dataset, each with 100 tracks per genre. But according to several papers, GTZAN struggles with the following: -Repititions -Mislabelings -Distortions in music According to Bob L Strurm at Alaborg University, between 10-11% of the dataset is initially mislabeled. (Much of the taggging for all the preliminary papers I found rotates around many users classification for genre on last.fm). Other sources are as followed. I'll post again on this as well, as I get more in depth with the topic/ subject. All the source are as followed:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1034.3092&rep=rep1&type=pdf https://arxiv.org/pdf/1306.1461.pdf https://stackoverflow.com/questions/11465880/gtzan-music-genre-dataset http://www.eecs.qmul.ac.uk/~sturm/research/GTZANtable2/index.html http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1034.3092&rep=rep1&type=pdf

Note: These are just my preliminary findings of the subject, but the most important points I could find

vlraik commented 7 years ago

@DBowald What do you mean by what features does GTZAN contain? If we are given raw audio, it is possible to manipulate and extract all sorts of features using Librosa: http://librosa.github.io/librosa/