jchen1981 / ZicoSeq

3 stars 0 forks source link

Use of sign with three categorical groups as group of interest? #6

Open sophieleech opened 1 month ago

sophieleech commented 1 month ago

Hi,

Previously I have used this tool with groups containing two categories.

To extract the results as a data frame rather than a plot - I modelled my script on the plot function as below:

##ripped from zicoseq.plot function 
    grp.name <- ZicoSeq.obj.p$grp.name
    meta.dat <- ZicoSeq.obj.p$meta.dat

    abundance <- ZicoSeq.obj.p$feature.dat
    prevalence <- apply(ZicoSeq.obj.p$feature.dat, 1, function(x) mean(x > 
                                                                                                                                        0))

    coefs <- sapply(ZicoSeq.obj.p$coef.list, function(x) x[grep(grp.name, 
                                                                                                                        rownames(x)), ])
    colnames(coefs) <- paste0("coef_Func", 1:length(ZicoSeq.obj.p$coef.list))

    R2 <- rowMaxs(ZicoSeq.obj.p$R2)

    signs <- t(sign(coefs))[t(ZicoSeq.obj.p$R2 == R2)]
    signs[signs == 0] <- 1
    plot.data <- data.frame(fdr.pvals = ZicoSeq.obj.p[['p.adj.fdr']], 
            R2 = R2 * signs, taxa = rownames(ZicoSeq.obj.p$R2),
                prevalence = prevalence, abundance = abundance)

... where the direction of the relationship is defined by the sign of the coefficient, and the strength defined by the R2 value.

However, how does this work when there are 3 categories? I can see that there is a coefficient for both e.g. one for control vs Disease1 and control vs Disease2 but I noticed that in the plot function it states '...with the sign indicating the association direction determined based on the sign of the regression coefficients (for multi-categorical variables, sign is not applicable).'

Is it still correct to use the sign value from each of the coefficients to determine the direction of relationship despite the plot function not using this feature?

Cheers, Sophie Leech

jchen1981 commented 1 month ago

If you have three categories, you will have two coefficients, each compared to the reference category.  By default, Zicoseq outputs ANOVA-type test results, so no directions will be indicated in the volcano plot. If you want to study each comparison (c2 vs c1, c3 vs c1 with c1 being the reference), you can create two dummy variables indicating c2 and c3 category. you then  test one dummy variable at a time. To create dummy variables, you can use “model.matrix“ function.JunSent from my iPhoneOn Oct 16, 2024, at 8:52 AM, sophieleech @.***> wrote: Hi, Previously I have used this tool with groups containing two categories. To extract the results as a data frame rather than a plot - I modelled my script on the plot function as below:

ripped from zicoseq.plot function

grp.name <- ZicoSeq.obj.p$grp.name
meta.dat <- ZicoSeq.obj.p$meta.dat

abundance <- ZicoSeq.obj.p$feature.dat
prevalence <- apply(ZicoSeq.obj.p$feature.dat, 1, function(x) mean(x > 
                                                                                                                                    0))

coefs <- sapply(ZicoSeq.obj.p$coef.list, function(x) x[grep(grp.name, 
                                                                                                                    rownames(x)), ])
colnames(coefs) <- paste0("coef_Func", 1:length(ZicoSeq.obj.p$coef.list))

R2 <- rowMaxs(ZicoSeq.obj.p$R2)

signs <- t(sign(coefs))[t(ZicoSeq.obj.p$R2 == R2)]
signs[signs == 0] <- 1
plot.data <- data.frame(fdr.pvals = ZicoSeq.obj.p[['p.adj.fdr']], 
        R2 = R2 * signs, taxa = rownames(ZicoSeq.obj.p$R2),
            prevalence = prevalence, abundance = abundance)

... where the direction of the relationship is defined by the sign of the coefficient, and the strength defined by the R2 value. However, how does this work when there are 3 categories? I can see that there is a coefficient for both e.g. one for control vs Disease1 and control vs Disease2 but I noticed that in the plot function it states '...with the sign indicating the association direction determined based on the sign of the regression coefficients (for multi-categorical variables, sign is not applicable).' Is it still correct to use the sign value from each of the coefficients to determine the direction of relationship despite the plot function not using this feature? Cheers, Sophie Leech

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

sophieleech commented 1 month ago

Hi Jun - appreciate the speedy response. So just to clarify - the coefficients (indicating direction) are still valid - but the R2 and p-value are ANOVA-style. To get individual p-values I would need to split them and test them one at a time. Any suggestion for appropriately adjusting the p-values for multiple comparisons after that or would it not be so necessary? From my understanding Zicoseq uses some kind of permutation to do FDR control?

jchen1981 commented 1 month ago

Yes, it uses permutation-based FDR control. Raw p-value may not be accurate since the number of permutations is not large.BestSent from my iPhoneOn Oct 18, 2024, at 8:49 AM, sophieleech @.***> wrote: Hi Jun - appreciate the speedy response. So just to clarify - the coefficients (indicating direction) are still valid - but the R2 and p-value are ANOVA-style. To get individual p-values I would need to split them and test them one at a time. Any suggestion for appropriately adjusting the p-values for multiple comparisons after that or would it not be so necessary? From my understanding Zicoseq uses some kind of permutation to do FDR control?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>