mlcollyer / RRPP

RRPP: An R package for fitting linear models to high-dimensional data using residual randomization
https://mlcollyer.github.io/RRPP/
4 stars 1 forks source link

a factor is significant but none of their levels area? #10

Closed lauraDRH closed 1 year ago

lauraDRH commented 1 year ago

Hi!

First, thank you so much for developing this package, it is very interesting.

I have a small issue, but it might be because I am not educated enough in statistics and not so much because of the package, but I hope you can still help me.

I have a community dataset and three factors I want to checck (island, sample_type and area). I first created a dissimilarity matrix based on Bray-Curtis distance, and then created the lm:

desv <- vegdist(esv, "bray")
rp <- rrpp.data.frame(desv=desv, island=meta$island, sample_type=meta$sample_type, area=meta$area)
fit <- lm.rrpp(desv ~ sample_type*island*area, data = rp) 

When I check the summary, every factor is significant:

>       anova(fit)

Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals 
Number of permutations: 1000 
Estimation method: Ordinary Least Squares 
Sums of Squares and Cross-products: Type I 
Effect sizes (Z) based on F distributions

                         Df     SS     MS     Rsq       F       Z Pr(>F)   
sample_type               7 36.164 5.1663 0.38966 11.4357 17.2473  0.001 **
island                    1  2.076 2.0760 0.02237  4.5953  6.3526  0.001 **
area                      2  2.249 1.1244 0.02423  2.4889  5.1362  0.001 **
sample_type:island        7  6.403 0.9148 0.06900  2.0249  8.5347  0.001 **
sample_type:area          6  4.206 0.7010 0.04532  1.5518  3.4881  0.001 **
island:area               2  1.875 0.9376 0.02021  2.0754  4.7157  0.001 **
sample_type:island:area   6  3.694 0.6157 0.03980  1.3628  2.2732  0.013 * 
Residuals                80 36.142 0.4518 0.38942                          
Total                   111 92.809  

I want to do a pairwise comparison to check, for example, where are the differences between area, but there are no significant differences between the levels

>       PW1 <- pairwise(fit, groups = rp$area)
>       summary(PW1)

Pairwise comparisons

Groups: 1 10 20 

RRPP: 1000 permutations

LS means:
Vectors hidden (use show.vectors = TRUE to view)

Pairwise distances between means, plus statistics
              d UCL (95%)         Z Pr > d
1:10  0.4525679 0.5005808 -1.350562  0.909
1:20  0.4034966 0.4558469 -1.575418  0.947
10:20 0.3032880 0.3604918 -1.832476  0.965

Am I doing something wrong? Also, what does d and UCL stand for?

Thank you so much for your time Laura

mlcollyer commented 1 year ago

Dear Laura,

There is an inconsistency in how area was evaluated between the two analyses. The ANOVA is based on sequential sums of squares and cross-products (also called type I, the default for lm.rrpp, but type II and type III are also available). The sample_type term was evaluated against a null model with only an intercept; the island term was evaluated against a null model with sample_type; area was evaluated against a null model with sample_type + island; etc. (Note that you also used as a default a method to add all main terms before adding interactions. You could ask the function to add interactions first, if type I SSCP is desired but you would prefer to have the sample_type:island interaction evaluated before area is added.)

In the pairwise function, you used the default null model choice, which would be sample_type + island + area + sample_type:island + sample_type:area + island:area, as this is the next to last model used as a null model in type I SSCP evaluation (adding sample_type:island:area would achieve a full model without any additional terms to add). The pairwise function does this because you did not specify what the null model should be, so it assumes as a default it was the last null model used in the type I SSCP estimation. You do not find significant results because area and most of its interactions are already in the null model.

If you wish to be consistent with your method of analysis, I recommend making a null model like this:

fit.null <- lm.rrpp(desv ~ sample_type + island, data = rp)

and make sure to specify this in the pairwise analysis.

I also recommend that you brush up on types of sums of squares and cross-products to make sure the type I route is the one you wish to pursue. I could see where a type II SSCP might be more informative, but it depends on your goals for analysis.

Hope this helps!

Mike

On Jul 30, 2023, at 7:24 AM, Laura @.***> wrote:

Hi!

First, thank you so much for developing this package, it is very interesting.

I have a small issue, but it might be because I am not educated enough in statistics and not so much because of the package, but I hope you can still help me.

I have a community dataset and three factors I want to checck (island, sample_type and area). I first created a dissimilarity matrix based on Bray-Curtis distance, and then created the lm:

desv <- vegdist(esv, "bray") rp <- rrpp.data.frame(desv=desv, island=meta$island, sample_type=meta$sample_type, area=meta$area) fit <- lm.rrpp(desv ~ sample_typeislandarea, data = rp) When I check the summary, every factor is significant:

  anova(fit)

Analysis of Variance, using Residual Randomization Permutation procedure: Randomization of null model residuals Number of permutations: 1000 Estimation method: Ordinary Least Squares Sums of Squares and Cross-products: Type I Effect sizes (Z) based on F distributions

                     Df     SS     MS     Rsq       F       Z Pr(>F)   

sample_type 7 36.164 5.1663 0.38966 11.4357 17.2473 0.001 island 1 2.076 2.0760 0.02237 4.5953 6.3526 0.001 area 2 2.249 1.1244 0.02423 2.4889 5.1362 0.001 sample_type:island 7 6.403 0.9148 0.06900 2.0249 8.5347 0.001 sample_type:area 6 4.206 0.7010 0.04532 1.5518 3.4881 0.001 island:area 2 1.875 0.9376 0.02021 2.0754 4.7157 0.001 sample_type:island:area 6 3.694 0.6157 0.03980 1.3628 2.2732 0.013 * Residuals 80 36.142 0.4518 0.38942
Total 111 92.809
I want to do a pairwise comparison to check, for example, where are the differences between area, but there are no significant differences between the levels

  PW1 <- pairwise(fit, groups = rp$area)
  summary(PW1)

Pairwise comparisons

Groups: 1 10 20

RRPP: 1000 permutations

LS means: Vectors hidden (use show.vectors = TRUE to view)

Pairwise distances between means, plus statistics d UCL (95%) Z Pr > d 1:10 0.4525679 0.5005808 -1.350562 0.909 1:20 0.4034966 0.4558469 -1.575418 0.947 10:20 0.3032880 0.3604918 -1.832476 0.965 Am I doing something wrong? Also, what does d and UCL stand for?

Thank you so much for your time Laura

— Reply to this email directly, view it on GitHub https://github.com/mlcollyer/RRPP/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUNU4TAOTZMLDNSXOV5BRLXSY77HANCNFSM6AAAAAA25EDGOU. You are receiving this because you are subscribed to this thread.

lauraDRH commented 1 year ago

Thanks Mike, that was very helpfull! I now see the problem of not defining the null model, makes sense. I will still have to read more about the different types of SSCP. Do you have any recommendations on where to start with that?

mlcollyer commented 1 year ago

Hi Laura,

I would put my faith in a good textbook on experimental design that covers factorial designs/models, but one can also piece together the important parts with internet searches.

I found this succinct presentation https://sscc.nimh.nih.gov/sscc/gangc/SS.html. the take-home point one should acquire is ANY effect (term) in ANOVA is based on the SS that is calculated, and these values can be calculated in different ways. The different way are defined by the null (reduced) and full models used; i.e., upon which terms evaluation of the target term is conditioned.

Cheers! Mike

On Jul 30, 2023, at 2:29 PM, Laura @.***> wrote:

Thanks Mike, that was very helpfull! I now see the problem of not defining the null model, makes sense. I will still have to read more about the different types of SSCP. Do you have any recommendations on where to start with that?

— Reply to this email directly, view it on GitHub https://github.com/mlcollyer/RRPP/issues/10#issuecomment-1657237162, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUNU4XGOJP3A6IASSJRTOLXS2RZBANCNFSM6AAAAAA25EDGOU. You are receiving this because you commented.