tramarobin / fctSnPM

Using spm1d package (v.0.4.3), compute anova and post-hoc tests from anova1 to anova3rm, with a non-parametric approach (permutation tests)
GNU General Public License v3.0
4 stars 2 forks source link

[JOSS Review] "alphaOriginal" code error? #8

Closed 0todd0000 closed 3 years ago

0todd0000 commented 3 years ago

On Lines 51-55 in fctPostHoc1d.m the following commands appear:

        if ~isempty(nT)
            alphaOriginal=0.05/nT;
        else
            alphaOriginal=alphaT/nComp;
        end

and a similar if-else statement appears on Lines 71-75 in fctPostHoc2d.m.

Shouldn't the 0.05 be instead the user's choice of alpha when calling fctSPM?

For example, if a user calls:

fctSPM(a, b, c, 'alpha',0.1);

it would appear that this choice of 0.1 is not propagated to the post hoc tests. Please check.

tramarobin commented 3 years ago

alpha defines the alpha risk for the ANOVA. This option is not propagated to the post hoc tests indeed. This was intentional as we thought only the main tests needed to be corrected if multiple ANOVAs are performed, but once the main test is done, the post hoc tests start with an initial alpha risk of 5% corrected with Bonferronni. This was reinforced by the idea that a post hoc test can only be significant if the ANOVA is significant at the same location.

Do you think the alpha risk for the post hoc should be automatically adjusted (before Bonferronni) in function of the alpha risk of the ANOVA?

Also, there are two optional inputs to define the initial alpha risk for the post hoc tests:

alphaT defines the initial alpha risk for the post hoc tests before the Bonferronni correction.
alphaOriginal=alphaT/nComp`, nComp beeing the numbers of post hoc tests.

nT defines number of comparisons, so if nT=1, all post hoc tests will be with an alpha risk of 5%. ` This is not recommended but this was implemented if someone wanted the same alpha risk for all post hoc.

0todd0000 commented 3 years ago

Consider the following:

  1. False positive implications: Different fields / applications require different alpha levels, depending on the scientific/medical/practical implications of a false positive. Values of alpha=0.01 or even alpha=0.001 are not uncommon in some fields.
  2. Multiple comparisons: If 1000 different variables are analyzed using ANOVA, there is a very high probability of a false positive, so alpha needs to be much lower than 0.05 to protect against this possibility.

From the documentation it appears that the user can indeed control alpha, but Lines 51-55 in fctPostHoc1d.m show that this is not the case. Please ensure that the user's choice of alpha is propagated everywhere.



Do you think the alpha risk for the post hoc should be automatically adjusted (before Bonferronni) in function of the alpha risk of the ANOVA?

Yes, it must match the user's alpha choice.



alphaT defines the initial alpha risk for the post hoc tests before the Bonferronni correction. alphaOriginal=alphaT/nComp, nComp beeing the numbers of post hoc tests.

I don't understand alphaT. Why is alphaT not alpha? Why does the user have separate control over this parameter? As far as I know there shouldn't be different alpha levels for main tests and post hoc analyses. Is there a reference from the literature to describe this parameter or which demonstrates its use?



nT defines number of comparisons, so if nT=1, all post hoc tests will be with an alpha risk of 5%. ` This is not recommended but this was implemented if someone wanted the same alpha risk for all post hoc.

Doesn't the design choice determine nT? Please explain alphaT and nT in greater detail. If these are ad hoc parameters meant for exploratory purposes, then they should probably be removed because they will disagree with the original alpha parameter.

0todd0000 commented 3 years ago

Please refer to my final comment in #9, which I will paraphrase here:

The relation amongst alpha, alphaT, nComp and nT is very confusing, and I am concerned that it may even be invalid. My understanding that the user can freely vary nT and/or alphaT and thereby artificially lower and raise p values and/or thresholds as desired. If true, this is highly undesirable behavior and also invalid in terms of false positive rate (FPR) control.

Please note that I now regard this as a major issue. Please explain how arbitrary alphaT and nT can control FPR. If they can not control FPR, then I would need to recommend that they be removed from the package entirely.

tramarobin commented 3 years ago

alpha is now propagated to post hoc tests, and then corrected with Bonferronni. alphaT is now the same as alpha by default. nT was indeed an ad-hoc parameter and was deleted.

By default, the alpha risk used during the post hoc is the same than the alpha risk for the ANOVA, with the Bonferronni correction. This correction depends on nComp, which is the number of t-tests performed, and this value is not modifiable by the user.

However, most of the colleagues I've been working with 'reset' the alpha risk at 0.05 after the ANOVA, even if the alpha risk of the ANOVA was lower. Consequently, I left the possibility for the user to change this value. I don't have scientific reference that this is a good practice (I searched on the internet but I couldn't find any relevant answer...), but this practice is nevertheless common. Maybe I can add a warning when this value is modified ?

For instance in D1_ANOVA1rm, with an alpha of 0.01, there are 6 comparisons so the alpha the post hoc tests is 0.01/6 = 0.0017. This value can be find in the posthoc.mat file at posthoc.tTests.alpha.

Concerning the cluster probabilities. I modified it to taking into account the alpha risk and the number of comparisons. lines 107-110 of fctPostHoc1d, the cluster probabilities are corrected as clustersT{c}.P*(0.05/alpha). Is this correction ok ? Or should be taking into account only the number of comparisons and be clustersT{c}.P*nComp ?

0todd0000 commented 3 years ago

However, most of the colleagues I've been working with 'reset' the alpha risk at 0.05 after the ANOVA, even if the alpha risk of the ANOVA was lower. Consequently, I left the possibility for the user to change this value. I don't have scientific reference that this is a good practice (I searched on the internet but I couldn't find any relevant answer...), but this practice is nevertheless common. Maybe I can add a warning when this value is modified ?

This practice is troubling. It is not problematic if the post hoc analyses agree with the original ANOVA results, but if one can freely vary alphaT, then one can create any post hoc results they want. Reporting would also become highly problematic; reviewers will want to know why there are two different alpha values for a single ANOVA. I would strongly discourage this practice, and I'm afraid that I cannot recommend publication if alphaT is freely modifiable. This modifiability would encourage poor practice.

It would be preferable to provide different post hoc options (i.e., not just Bonferroni). The problem is that alternative, robust post hoc procedures are more difficult to implement for continuum / functional data than they are for simple data, and this is why post hoc procedures are not directly implemented in spm1d.

To summarize: I believe this package adds valuable, automated post hoc functionality to spm1d, but I can not recommend publication if it encourages or automates poor post hoc practices. Please remove alphaT as an option.

I regard this alphaT issue as major, and cluster p values as comparatively minor issue so I will transfer the latter to a new, separate issue.

tramarobin commented 3 years ago

I would strongly discourage this practice, and I'm afraid that I cannot recommend publication if alphaT is freely modifiable. This modifiability would encourage poor practice.

It is understandable. Before I delete this option, I just have additional suggestions.

It is not problematic if the post hoc analyses agree with the original ANOVA results.

This is exactly what this function is doing. The post hoc will appear significant only and only if the ANOVA is significant at the same nodes.

...if alphaT is freely modifiable. This modifiability would encourage poor practice.

Do you mean that people could set an alphaT value above 0.05 ? If it is the case, I can add an option that limit the range of this value between 0.05 and alpha, with a warning message if alphaT is modified. Besides, the same thing can happen with alpha.

I think people with a good reason to modify the alpha or alphaT value will justify it in their paper, and people without statistical knowledge will use this function at it is, that is to say, with a single alpha value.

0todd0000 commented 3 years ago

This is exactly what this function is doing. The post hoc will appear significant only and only if the ANOVA is significant at the same nodes.

How does it achieve this? Please summarize in text and point me to the relevant lines of code in the .m files.

It is not problematic if the post hoc analyses agree with the original ANOVA results.

Can you please cite the source of this quote? It does not appear to be from this issue. I suspect that this quote comes from a discussion regarding manual and not algorithmic post hoc analyses.

tramarobin commented 3 years ago

Can you please cite the source of this quote? It does not appear to be from this issue. I suspect that this quote comes from a discussion regarding manual and not algorithmic post hoc analyses.

This was on the message you wrote just before my answer:

This practice is troubling. It is not problematic if the post hoc analyses agree with the original ANOVA results, but if one can freely vary alphaT, then one can create any post hoc results they want.

How does it achieve this? Please summarize in text and point me to the relevant lines of code in the .m files.

In fctSPM.m line 274. The output anovaEffects is a is a structure of cells corresponding to each effect or interactions of the ANOVA. Each cell corresponds to logical values with the significant nodes of the ANOVA equal to 1.

anovaEffects is an input in fctPostHoc1d.m and fctPostHoc2d.m lines 278-280.

For a 1way ANOVA in 1D, in fctPostHoc1d.m lines 102-105 posthoc.tTests.Tsignificant{1,comp}=zeros(dimensions(1),dimensions(2)); creates a zero (non significant) variable. mapLogical=abs(posthoc.tTests.Tcontinuum{1,comp})>=posthoc.tTests.Tthreshold{comp}; corresponds to the logical values for the t-test. posthoc.tTests.Tsignificant{1,comp}(anovaEffects{1})=mapLogical(anovaEffects{1}); integrates only the significant values common to mapLogical and anovaEffects in posthoc.tTests.Tsignificant. posthoc.tTests.Tsignificant is now corrected to take into consideration the effect of the ANOVA.

This is rather easy for 1way ANOVA. However, I might use some help to justify my approach for 2ways and 3ways ANOVA when there is an interaction effect. My approach is, like in scalar analysis, to consider the main effect (post hoc result corrected with ANOVA) if there is no interaction on the same nodes, and to consider interactions otherwise. When there is a triple interaction, the main effect (for instance C) and the two interaction effects (AxC and BxC) correspond also to the post hoc results corrected with the ANOVA.

For a 2way ANOVA in 1D, in fctPostHoc1d.m, the main procedure as for the 1way ANOVA is performed for main effects (lines 278-282) Then line 325, mainForInteraction{mainEffect}=posthoc.tTests.Tsignificant; is the main effect corrected with the ANOVA. This variable is then use when interpreting the interactions.

For interactions, fctPostHoc1d line 908 indiceMain=findWhichMain(modalitiesAll{testedEffect{comp}},combi{Comp{comp}(1)}(testedEffect{comp}),combi{Comp{comp}(2)}(testedEffect{comp})); is a bit messy, but this function get the indice to use the results the corresponding ANOVA in function of the comparison performed. tMainEffect=abs(mainForInteraction{testedEffect{comp}}{indiceMain})>0; represents one of the two main effects of the ANOVA
effectCorr=anovaEffects{3}(:); is the interaction effect of the ANOVA

line 920: posthoc.tTests.Tsignificant{1,comp}(effectCorr)=mapLogical(effectCorr); the post hoc is corrected with the result of the ANOVA.

lines 922-923 tMainEffect(effectCorr==1)=0;correct the main effect of the ANOVA by making non significant the nodes where there is an intercation effect realEffect{comp}=reshape(max([tMainEffect(:)';posthoc.tTests.Tsignificant{1,comp}(:)']),dimensions(1),dimensions(2)); the nodes where there is a main or an interaction effect are equals to 1, with the priority for interaction effect line 930 : posthoc.tTests.Tsignificant{1,comp}=realEffect{comp}; use realEffect as Tsignificant

For the 3way ANOVA, the procedure is the same, however, the results of the double interaction is corrected by the main effect and the first two interactions.

line 705: intForInteractions{anovaFixedCorr(effectFixed)}.t=realEffect;this value is already corrected by the main effect as in 2way ANOVA.

lines 912-918 : indiceInteraction=findWhichInteraction(intForInteractions{intLocations(testedEffect{comp},interactions)}.comp,combi(Comp{comp}),interactions,testedEffect{comp});is a bit messy, but this function get the indice to use the results the right ANOVA in function of the comparison performed. tInteractionEffect{interactions}=intForInteractions{intLocations(testedEffect{comp},interactions)}.t{indiceInteraction}(:); represents the mains and two interaction effects of the ANOVA (if the effect C is tested, this corresponds the main effect C and to AxC and BxC, with the priority to interactions as previously described) effectCorr=anovaEffects{7}; is the triple interaction of the ANOVA

line 920: posthoc.tTests.Tsignificant{1,comp}(effectCorr)=mapLogical(effectCorr); the post hoc is corrected with the result of the interaction effect of the ANOVA.

lines 925-930 tInteractionEffect{interactions}(effectCorr==1)=0; previous observed effect are corrected with the triple interaction effect of the ANOVA. realEffect{comp}=reshape(max([tInteractionEffect{1}';tInteractionEffect{2}';posthoc.tTests.Tsignificant{1,comp}(:)']),dimensions(1),dimensions(2)); represents the triple interaction effect considering the main and double interaction effects. posthoc.tTests.Tsignificant{1,comp}=realEffect{comp}; use realEffect as Tsignificant

I hope it helped you understand my approach, the idea is to correct the effect of the last tests with the previous tests performed, considering both the results of the ANOVA and post hoc tests. This method is the same in 2D.

0todd0000 commented 3 years ago

Can you please cite the source of this quote? It does not appear to be from this issue. I suspect that this quote comes from a discussion regarding manual and not algorithmic post hoc analyses.

This was on the message you wrote just before my answer: "This practice is troubling. It is not problematic if the post hoc analyses agree with the original ANOVA results, but if one can freely vary alphaT, then one can create any post hoc results they want."

OK, thank you. My point was that the biggest problem would be if post hoc results disagree with ANOVA results. If they do not disagree, then the biggest problem is avoided. However, this does not mean that the post hoc results are valid.

From your description of the code, this is clearly an ad hoc approach, and in particular this line:

posthoc.tTests.Tsignificant{1,comp}(anovaEffects{1})=mapLogical(anovaEffects{1});

This is an ad hoc adjustment of t results and is not based on theory. This is problematic, mainly because it almost certainly does not yield valid results for arbitrary ANOVA.

Would you be willing to do one or more of the following?

  1. Remove all code based on alphaT. In my view this is the best option, as including alphaT and and the above ad hoc correction code will incorrectly suggest to users that the approach is valid.

  2. Numerically validate the code. Conduct a simulation study involving various ranges of main and interaction effects, and quantify the behavior of this post hoc approach. However, but based on your description of the code above I suspect that this simulation will show that the approach is actually not valid.

  3. Provide warnings in the documentation and in the user interface that use of alphaT has not been validated, and should be used only for data exploration purposes, and mustn't be used for reporting purposes. This would require minimal code adjustment, but would require the issue to be adequately documented in all of (i) the paper, (ii) the online documentation, (iii) the code comments, and (iv) the user interface (i.e., warnings, etc.). One way to emphasize this in the user interface would be to change "alphaT" to "unvalidated_posthoc_alpha" or something similar, so that the user is aware that the approach is not validated.

In my view (3) would be easiest, and would also be sufficient for publication.

tramarobin commented 3 years ago

Provide warnings in the documentation and in the user interface that use of alphaT has not been validated, and should be used only for data exploration purposes, and mustn't be used for reporting purposes. This would require minimal code adjustment, but would require the issue to be adequately documented in all of (i) the paper, (ii) the online documentation, (iii) the code comments, and (iv) the user interface (i.e., warnings, etc.). One way to emphasize this in the user interface would be to change "alphaT" to "unvalidated_posthoc_alpha" or something similar, so that the user is aware that the approach is not validated.

I applied this option.

By default, alphaT is the same as the alpha defined for the ANOVA. If the value of alphaT is modified, a warning is display at the start of the post hoc procedure ("Post-hoc analysis is not valid. Please keep the same alpha risk for ANOVA and post hoc tests ").

Besides, a warning ("alphaOriginal is not valid") is saved in spmAnalysis.posthoc.tTests.warning if alphaT is modified.

There are 4 outcomes to better comprehend the alpha risk choosen and performed by the analysis in spmAnalysis.posthoc. :

There might be a differences between alphaBonferronni and pCritical if the number of iterations with the dataset is not sufficient to achieve a pCritical = alphaBonferronni. In this case warnings are displayed and saved in tTests.nWarning. *tTests.nWarning represents the number of warnings displayed during the analysis : 0 is OK, 1 means the number of iterations was reduced but pCritical = alphaBonferronni, 2 means that the number of iterations was reduced and pCritical > alphaBonferronni. In this case, more subjects are required to performed the analysis.

These explanations are present in the online documentation as well as in the functions.

  • alphaT. Do not modify except for exploratory purposes. alphaT is the original alpha used for post hoc tests (Bonferonni correction is applied after as alphaT/number of comparisons. Default is the same as alpha. @isnumeric.

I also applied the correction I discussed in issue #6 to improve the outputs (creation of spmAnalysis). You can find also modified examples with fctSPMS which allow to perform the statistical analysis without plotting and saving the results. It would be easier and faster to verify the above mentioned modifications.

0todd0000 commented 3 years ago

OK, thank you for confirming and for your explanations.



I find the variable name tTests.alphaBonferronni to be a bit confusing, and I suggest changing it to tTests.pCriticalBonferroni --- or something similar --- for the following reasons:



There might be a differences between alphaBonferronni and pCritical if the number of iterations with the dataset is not sufficient to achieve a pCritical = alphaBonferronni.

I do not understand how pCritical and alphaBonferroni can be different, please explain.

tramarobin commented 3 years ago

As suggested, alphaBonferronni was changed to pBonferronni and alpha is used instead of alphaOriginal

I do not understand how pCritical and alphaBonferroni can be different, please explain.

Permutation tests require a number of iterations of 1/alpha to achieve an analysis that can meet the precision of alpha. For an alpha of 5%, 20 iterations are required. When pBonferronni is reduced, the number of required iterations to achieve the analysis is increased. It may be possible that the dataset does not contains enough data to perform the required number of iterations. NB: The number of maximal iterations based on Ttest_inf.nPermUnique of the spm1d package, and corresponds to the number of unique dataset squared.

When there are not enough permutations to achieve pBonferroni, the maximal number of iterations with the dataset tTests.maxIterations is used and pCritical is set to 1/tTests.maxIterations. If this pCritical is superior to pBonferronni, warning messages are displayed and saved in tTests.nWarning instead of stopping the analysis. To overcome this issue, more subjects must be included in the analysis (more iterations possible) or a simpler experimental design (less interactions and less groups to keep pBonferronni higher) is needed.

0todd0000 commented 3 years ago

Oops, sorry about that question! I was thinking so much about alpha and alphaT that I forgot that the package focusses on nonparametric tests. Your answer is good, as is the software's handling of this issue.

Minor points:

Do you intend to remove alphaBonferroni completely?

tramarobin commented 3 years ago

Oops, sorry about that question! I was thinking so much about alpha and alphaT that I forgot that the package focusses on nonparametric tests. Your answer is good, as is the software's handling of this issue.

No problems.

Do you intend to remove alphaBonferroni completely?

Yes I do. I was focused on the changes and forgot to push the changes to repository... I think it is ok now, sorry.

0todd0000 commented 3 years ago

Thank you. I found just one remaining occurrence. In README.md: tTests.warning : only if alphaOrignial is modified with alphaT input

tramarobin commented 3 years ago

must be corrected now.

0todd0000 commented 3 years ago

OK, thank you very much!

This issue was my only major concern. Thank you very much for your prompt attention to this matter, and also for being willing to make code changes. I think that the programming interface is now easier to understand, at least for me.