Closed sbuschjaeger closed 3 years ago
Thank you for reporting! I am surprised nobody came across (or posted) this issue earlier. Perhaps, everyone is happy with the default column names. I have already uploaded the fixed release.
However, I have a difficult time to understand what "block", "groups", and "y" mean in this context. More specifically, are blocks (or groups?) different classifiers or datasets and is y the ranks or the accuracies?
You can find some explanation here. Is it what you are looking for?
You dont happen to have some example code and or explanation how to plot CD diagrams?
Unfortunately, no. But I have found this repo. I need some time to see how such a diagram is plotted. But I guess it can be adapted to the other tests (not only Wilcoxon as in this repo).
Thanks for the fix. I guess most people just us the numpy arrays which I now do as well. I also found that repo you mentioned, but the code is rather messy so I decided to implement my own plotting and use your code for the statistical tests.
I saw that example in the Readme and that is what causes the confusion. In your example the columns correspond to primary factors (the yield) and rows correspond to blocking factors (the field). You then perform the Friedman test if there is a difference in the data with the transposed data matrix ss.friedmanchisquare(*data.T)
. After that, however you do not transpose the data anymore, which confuses me: For my use-case I have a (19, 13)
matrix for which I want to compute the pairwise posthoc_wilcoxon
statistics. As expected, ss.friedmanchisquare(*data.T)
works fine (with transposed data). However, the posthoc_wilcoxon
test seems to go over all rows and not columns. Applying that to my data will get me a (19, 19)
output, but I expected pairwise tests across the other dimension (and get a (13, 13)
) output. I played around with/without transpose of the data, but from the pvalues I get and the resulting dimensions it only makes sense to me to transpose my data for both calls.
Please note asterisk that I use before data.T
. NumPy arrays are unpacked by rows, and we have groups in columns. So, we need to transpose the array first. In your case, you just need to transpose the array (such that you have the groups in rows), pass it to ss.friedmanchisquare
and then to posthoc_wilcoxon
function. You should obtain the correct result in that case.
Thanks for the clarification
Hi,
I cannot use post-hocs test for dataframes with
melted = True
andgroup_col != 'groups'
,block_col != 'blocks'
andy_col != 'y'
. Basically, anything which deviates from the examplebreaks the code. The error is likely due to
__convert_to_block_df
(https://github.com/maximtrp/scikit-posthocs/blob/master/scikit_posthocs/_posthocs.py) which returns the oldy_col
,group_col
,block_col
values but assigns the column names "groups" / "blocks" / "y"On a somewhat related note: I wanted to implement / use these tests to plot CD diagrams as suggested in "J. Demsar (2006), Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, 7, 1-30." which you also cite in the documentation. However, I have a difficult time to understand what "block", "groups", and "y" mean in this context. More specifically, are blocks (or groups?) different classifiers or datasets and is y the ranks or the accuracies? You dont happen to have some example code and or explanation how to plot CD diagrams?
Thank