mdeff / fma

FMA: A Dataset For Music Analysis
https://arxiv.org/abs/1612.01840
MIT License
2.21k stars 432 forks source link

Genre prediction #11

Closed trinayan closed 6 years ago

trinayan commented 6 years ago

Hi,

I am trying to use FMA for my project work and it seems almost half of the genre information in the data is NaN. How do you recommend we deal with these?

Thanks

mdeff commented 6 years ago

I assume you're talking about the genre_top column, which indeed has 47% coverage (see Table 2). That is because 47% of the tracks have a unique root genre (while others might have multiple, e.g. a track could be both Rock and Experimental). If you consider multi-label predictions, then 98% of the tracks are associated to at least 1 genre (see the genres and genres_all columns in Table 2). Please see section 2.4 of the paper for a description of how those columns relate to each other.

I would recommend you to choose one of the problems proposed in the paper.

We propose the following prediction problems of increasing difficulty:

  1. Single top genre on the balanced small subset.
  2. Single top genre on the unbalanced medium subset.
  3. Multiple top genres on the large / full set.
  4. Multiple (sub-)genres on the large / full set.

2017-11-22-01 07 25

trinayan commented 6 years ago

Thanks for the information