sigsep / sigsep-mus-2018-analysis

Analysis and Visualization for SiSEC 2018
9 stars 2 forks source link

Correct post-hoc test for significance? #2

Closed f90 closed 6 years ago

f90 commented 6 years ago

Hello,

I noticed that the Conover post-hoc tests yields very close to zero p-values, which seems a bit unusual as the box plots show quite a lot of overlap in SDR values between the methods... Could this be because it is currently computed by taking a vector of all segment-wise observations from each method and comparing them, ignoring that some segments are correlated because they belong to the same song? This is how it looks to me at least:

sp.posthoc_conover(df_voc, val_col='score', group_col='estimate')

I don't know stats very well, but could it be that we need to apply a blocked design, in which the segment-wise observations from the same song are put into one block? I think block assignments are supported by the conover method.

faroit commented 6 years ago

yes of course, thanks for the pointer. Also the posthocs would not work for an uneven number of groups, so it would not have run on the submissions (compared to the oracles).

We now use the median as aggregation for all plots as indicated by some statistics papers (not my expertise as well ;-)

I've updated the jupyter notebook that includes all the results that were used to create the SiSEC evaluation. Also you can run the notebook on google colab.