Genre prediction - Githubissues

trinayan commented 6 years ago

Hi,

I am trying to use FMA for my project work and it seems almost half of the genre information in the data is NaN. How do you recommend we deal with these?

Thanks

mdeff commented 6 years ago

I assume you're talking about the genre_top column, which indeed has 47% coverage (see Table 2). That is because 47% of the tracks have a unique root genre (while others might have multiple, e.g. a track could be both Rock and Experimental). If you consider multi-label predictions, then 98% of the tracks are associated to at least 1 genre (see the genres and genres_all columns in Table 2). Please see section 2.4 of the paper for a description of how those columns relate to each other.

I would recommend you to choose one of the problems proposed in the paper.

We propose the following prediction problems of increasing difficulty:

Single top genre on the balanced small subset.

Single top genre on the unbalanced medium subset.

Multiple top genres on the large / full set.

Multiple (sub-)genres on the large / full set.

2017-11-22-01 07 25

trinayan commented 6 years ago

Thanks for the information

mdeff / fma

Genre prediction #11