janezd / baycomp

MIT License
69 stars 15 forks source link

`HierarchicalTest` when `nfolds == 1` #13

Closed dpaetzel closed 2 years ago

dpaetzel commented 2 years ago

I'm sorry for bothering you with another question. I have the case where I do not need to use cross-validation (I can generate as much data as I want). This means that the number of runs is equal to the number of scores I have for each learning task, runs == nscores.

In that case, HierarchicalTest computes nfolds correctly as being 1. This, however, results in the correlation to be computed as

rho == 1 / nfolds == 1

I may be mistaken but, based on the hierarchical model, rho being 1 here indicates maximal correlation between the runs for learning task i when the correlation, intuitively, is smaller than, say, the correlation of k-fold cross-validation for any k? (I'm not entirely sure but shouldn't rho be 0 in this case (i.e. a diagonal covariance matrix for learning task i) since the runs are rather uncorrelated?)

gcorani commented 2 years ago

However this is not the right test if you do not have multiple folds, as it expects to make inference from n observations coming from n folds. That is, even if the test managed correctly the correlation, the test would have only a single observation to analyze. In that case, I would rather apply a (Bayesian) signed-rank test on the n instances of the test set. The shortcoming of that analysis is that you would be conditioning on a single training set.

dpaetzel commented 2 years ago

Doesn't the test make inference from nfolds * runs observations? E.g. if runs == 20 and nfolds == 1, I have nSamples == 20 (here and then here) and sampling thus provides me with a meaningful distribution?

(I'd rather not use the Bayesian signed-rank test since it only provides me with probabilities of better/rope/worse but I require an estimate of the distribution of the differences to gauge how much better/worse the two algorithms perform.)

gcorani commented 2 years ago

Hi, the test is designed to handle cross-validation results, hence results coming from multiple folds and possibly multiple runs. I would not use it if you have a single test set. If you want to use a parametric test on a single test set, you can use a Bayesian t-test. See Kruschke paper ("Bayesian estimation supersedes t-test")

dpaetzel commented 2 years ago

Thank you for your assessment and your patience. I thought I could save myself some work—I already knew the Kruschke model but it does not quite model the case of “two algorithms on several learning tasks” (or at least not as well as your HierarchicalTest). However, I guess I won't be able to avoid extending the Kruschke model in order to properly conduct the analysis I'm having in mind.

gcorani commented 2 years ago

I guess you should the code of the Hierarchical test so that the observation come from a Normal rather than from a MVN. However it could be necessary to adapt also some further minor details.