Closed dpaetzel closed 2 years ago
Hi David, thanks for your comment. In the paper I was considering the difference of accuracy between classifiers, which is bounded in [-1, 1]. Alternative bounds should be considered if the metric has a different scale, but it seems to me that -max(abs(x)), max(abs(x)) is sensible, as long as we deal with paired differences between algorithms.
Does this address your question? (ps: I'll check later also your other comments)
best, Giorgio
Hi Giorgio, thanks a lot for your response. :slightly_smiling_face:
You're right, my mistake. [-1, 1]
is the correct interval for differences of accuracy; I simply overlooked that the scale changes from [0, 1]
to [-1, 1]
as soon as you consider differences. The same goes for the general case of [min(x), max(x)]
changing to [-max(abs(x), max(abs(x))]
.
I'll close this issue.
Here and here (Stan implicitly samples this from a uniform, which is the intended behaviour as described in the publications on the hierarchical model) the prior on delta is set as
Uniform(-max(abs(x)), max(abs(x)))
. In the publications on the hierarchical model, this isUniform(-1, 1)
, presumably since accuracy is the metric under consideration whose maximum is 1 (although this does not explain the lower bound of -1).My question is: Wouldn't
Uniform(min(x), max(x))
be better-suited in general than the current choice ofUniform(-max(abs(x)), max(abs(x)))
? Many metrics are asymmetric around 0:Because of that, using
Uniform(-max(abs(x)), max(abs(x)))
as the default prior on delta seems to be unnatural to me but I may be mistaken?A possible way to overcome this issue flexibly is to introduce another parameter to
HierarchicalTest.sample
, named e.g.data_set_mean_prior
, and then handling the probably most-used cases but also allowing users to specify the lower and upper bound by themselves:What do you think? Should I create a PR for this?