Open sophieleech opened 1 month ago
If you have three categories, you will have two coefficients, each compared to the reference category. By default, Zicoseq outputs ANOVA-type test results, so no directions will be indicated in the volcano plot. If you want to study each comparison (c2 vs c1, c3 vs c1 with c1 being the reference), you can create two dummy variables indicating c2 and c3 category. you then test one dummy variable at a time. To create dummy variables, you can use “model.matrix“ function.JunSent from my iPhoneOn Oct 16, 2024, at 8:52 AM, sophieleech @.***> wrote: Hi, Previously I have used this tool with groups containing two categories. To extract the results as a data frame rather than a plot - I modelled my script on the plot function as below:
grp.name <- ZicoSeq.obj.p$grp.name
meta.dat <- ZicoSeq.obj.p$meta.dat
abundance <- ZicoSeq.obj.p$feature.dat
prevalence <- apply(ZicoSeq.obj.p$feature.dat, 1, function(x) mean(x >
0))
coefs <- sapply(ZicoSeq.obj.p$coef.list, function(x) x[grep(grp.name,
rownames(x)), ])
colnames(coefs) <- paste0("coef_Func", 1:length(ZicoSeq.obj.p$coef.list))
R2 <- rowMaxs(ZicoSeq.obj.p$R2)
signs <- t(sign(coefs))[t(ZicoSeq.obj.p$R2 == R2)]
signs[signs == 0] <- 1
plot.data <- data.frame(fdr.pvals = ZicoSeq.obj.p[['p.adj.fdr']],
R2 = R2 * signs, taxa = rownames(ZicoSeq.obj.p$R2),
prevalence = prevalence, abundance = abundance)
... where the direction of the relationship is defined by the sign of the coefficient, and the strength defined by the R2 value. However, how does this work when there are 3 categories? I can see that there is a coefficient for both e.g. one for control vs Disease1 and control vs Disease2 but I noticed that in the plot function it states '...with the sign indicating the association direction determined based on the sign of the regression coefficients (for multi-categorical variables, sign is not applicable).' Is it still correct to use the sign value from each of the coefficients to determine the direction of relationship despite the plot function not using this feature? Cheers, Sophie Leech
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Jun - appreciate the speedy response. So just to clarify - the coefficients (indicating direction) are still valid - but the R2 and p-value are ANOVA-style. To get individual p-values I would need to split them and test them one at a time. Any suggestion for appropriately adjusting the p-values for multiple comparisons after that or would it not be so necessary? From my understanding Zicoseq uses some kind of permutation to do FDR control?
Yes, it uses permutation-based FDR control. Raw p-value may not be accurate since the number of permutations is not large.BestSent from my iPhoneOn Oct 18, 2024, at 8:49 AM, sophieleech @.***> wrote: Hi Jun - appreciate the speedy response. So just to clarify - the coefficients (indicating direction) are still valid - but the R2 and p-value are ANOVA-style. To get individual p-values I would need to split them and test them one at a time. Any suggestion for appropriately adjusting the p-values for multiple comparisons after that or would it not be so necessary? From my understanding Zicoseq uses some kind of permutation to do FDR control?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Hi,
Previously I have used this tool with groups containing two categories.
To extract the results as a data frame rather than a plot - I modelled my script on the plot function as below:
... where the direction of the relationship is defined by the sign of the coefficient, and the strength defined by the R2 value.
However, how does this work when there are 3 categories? I can see that there is a coefficient for both e.g. one for control vs Disease1 and control vs Disease2 but I noticed that in the plot function it states '...with the sign indicating the association direction determined based on the sign of the regression coefficients (for multi-categorical variables, sign is not applicable).'
Is it still correct to use the sign value from each of the coefficients to determine the direction of relationship despite the plot function not using this feature?
Cheers, Sophie Leech