johnnybuggy / HOLO

HOLO - The Music Amalgamation System
Other
16 stars 7 forks source link

Test samples #10

Closed PavelTorgashov closed 10 years ago

PavelTorgashov commented 10 years ago

We have to do a test samples. It is necessary to create automated tests that will determine the adequacy of the various algorithms. I propose to select 10 (or 20, or 100) audios of various genres. And then manually set the pairwise similarity estimate (or divide them on clusters). Next, we need to create automated tests that will determine how the algorithm will correspond to human estimates. It will most objective rating of algos.

vsoldatkin commented 10 years ago

That's very good proposal. It may be helpful to look at http://freemusicarchive.org/ to take samples from there. Their tracks seems to be under a free license for various uses.

johnnybuggy commented 10 years ago

Good idea that also was on my mind, though I did this manually every time after implementing new algorithm. Let's do choose 50 tracks per contributor, join them and gather similarity votes from each peer. Thus, we should have 3 * 50 * (50-1) / 2 = 3675 votes for full matrix coverage. I think this is possible to carry out within 3-4 weeks.

johnnybuggy commented 10 years ago

Thoughts regarding tracks. Does everybody have dropbox account? I can create dropbox folder with my tracks, then you can share your tracks to me, I add them to the same folder and share whole folder to every of you. I propose those requirements for tracks:

PavelTorgashov commented 10 years ago

Let's do choose 50 tracks per contributor

No, I suggest to rate single list of files. Each contributor must rate this list, therefore we have 3 rates per file. Then we can to calc avg rate. This will more objective rate.

Furthermore, not necessarily estimate all possible pairs. We can choose some arbitrary number of pairs. Because full count of pairs is very large.

PS And even better, that would be not contributor rates, but rates of stranger peoples. Only I do not know how we can do it :)

PavelTorgashov commented 10 years ago

Also, I offer the following rating scale:

1 Absolutely dissimilar music of other genre 2 Other music genre 3 Music has a similar genre 4 Music is very similar

And even better, that would be not contributor rates, but rates of strangers people. Only I do not know how we can do it.