Allow for no adjustment of p-values

juancq commented 1 year ago

Current adjustment options are holm and bonferroni. API should support no adjustment.

mirkobunse commented 1 year ago

Can you clarify why you need this feature?

Strictly speaking, the adjustment is necessary for producing statistically meaningful plots. Each comparison of two methods (whether they can be distinguished and therefore should not be connected) is taken out as a hypothesis test. Hence, each comparison has a small chance of producing a false positive, i.e., it might lack a connection where one should be. This chance is only small because we set the p value to be small (5% by default). Now, if we omit the adjustment, each of the comparisons has a small chance of failing, which leads to a considerable chance that at least some of the comparisons fail. In this case, the plot is not meaningful anymore because any apparent difference between two methods might be a false one. This is the problem of multiple testing.

juancq commented 1 year ago

Correction for multiple testing is not without its problems: https://www.jstor.org/stable/20065622. Ultimately, it should be up to the user discretion to set this variable.

mirkobunse commented 1 year ago

Thanks for the reference; reading it was quite exciting for me. I also do agree with you that, in general, software should not be built on opinionated design choices. However, I strongly disagree with the author of the reference - not necessarily in general, but certainly in terms of critical difference diagrams - and I fear that adding a "no adjustment" option will tempt users to produce diagrams that are outright meaningless.

In essence, Rothman gives two arguments against adjustments in multiple testing:

Argument 1: Adjustments increase the number of type II errors

As Rothman puts it:

"Unfortunately, the cost of the insurance policy [i.e., the adjustment] is to increase the frequency of incorrect statements that assert no relation between two factors"

What adjustments reduce is the power of each test. However, the correct interpretation of any non-rejected null hypothesis is never that the null hypothesis was correct - the interpretation is merely that the data does not allow conclusions to be drawn. Hence, there is no such thing as an assertion of unrelatedness.

Argument 2: Random data of some unrelated test can prevent the rejection of the current test

Again, quoting Rothman,

"irrelevant information from the data can diminish the informativeness of an association of possible interest."

This statement is only true for Bonferroni's correction. In contrast, Holm's correction rejects only the most "rejectable" null hypotheses. Random data would produce the highest p values, which Holm's correction would simply ignore. This data does not have an effect on the rejection of wrong null hypotheses.

Additional remark

Rothman further states that adjustments are motivated by two presumptions, (i) that chance causes many unusual findings and (ii) that observations caused by mere chance should not be investigated further.

His argument against these presumptions is exciting and thought-provoking. However, it does not apply to critical difference diagrams.

The usual purpose of critical difference diagrams is to convince a beholder that certain sets of treatments lead to different outcomes. This purpose is best achieved by assuming a "worst-case beholder" (think of a highly critical reviewer of a scientific publication): we should be prepared to convince even someone who adversely believes (i) that all treatments might lead to the same outcome and (ii) that none of the proposed treatments should be investigated further (compare these beliefs to Rothman's presumptions). Most likely, we will personally disagree with this worst-case beholder. Nevertheless, convincing this person requires our diagrams to build precisely on the two presumptions that Rothman criticizes.

mirkobunse / critdd