Closed QianChwnLyn closed 7 months ago
It is probably a multiple hypothesis testing issue. For this figure, we controlled FDR all cell type-trait pairs at a stringent threshold <0.05 (benjamini hochberg). You can try a higher threshold like 0.1 or 0.2.
Hello, is it modified during the perform-downstream process? I didn't find the relevant parameters in the parameters of the function. Also set "df_fdr_prop"=0.2 in the drawing function scdrs.util.plot_group_stats and the result remains unchanged.
发件人: Martin Jinye @.> 发送时间: 2023年11月20日 23:15 收件人: @.> 抄送: @.>; @.> 主题: Re: [martinjzhang/scDRS] The graphs from a single result and multiple results are inconsistent (Issue #74)
It is probably a multiple hypothesis testing issue. For this figure, we controlled FDR all cell type-trait pairs at a stringent threshold <0.05 (benjamini hochberg). You can try a higher threshold like 0.1 or 0.2.
― Reply to this email directly, view it on GitHubhttps://github.com/martinjzhang/scDRS/issues/74#issuecomment-1819258281, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYQEHKHWGNES3ITAKVN5RHDYFNXXRAVCNFSM6AAAAAA7SN6CFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJZGI2TQMRYGE. You are receiving this because you authored the thread.Message ID: @.***>
Hi @KangchengHou , can you help with this?
Hi @QianChwnLyn
The small square for AD-MG denote cell-type-level association significance at FDR < 0.05, and therefore is expected to change if you plot one trait vs. multiple traits.
And I can confirm that plot_group_stats
does not have the option to vary FDR significance threshold. Perhaps the most straightforward option is to plot your results using AD.scdrs_group.SubClass
file directly. The column you want to use is assoc_mcz
.
Hi @KangchengHou
The drawing I understand is to put together the *.scdrs_group.SubClass results of multiple gene sets, and the drawing code does not see any further processing? So why do the results of drawing one and drawing multiple ones differ?
And I found that there is no such problem in some data when multiple gene sets are plotted on one graph. If this can be solved?
Hi @QianChwnLyn
I want to point out that this is not a bug but rather a statistical nuance.
In multiple hypothesis testing (e.g., FDR control), the significance of a given hypothesis depends on p-values of other hypotheses. In our case, each hypothesis is a cell type-disease pair. If we test the hypotheses across diseases instead of just one disease, the results will change.
@KangchengHou, do you think it makes sense to update the plot_group_stats
function with a tunable FDR threshold?
yes see https://github.com/martinjzhang/scDRS/pull/75/commits/83d626445aeaa4bc556e7dcb553844921b2e2dc7 (i can't directly merge into master which is protected)
on the other hand, user could just take the raw dataframe and plot in the style as they want
@KangchengHou
Thanks! How can I get your modified scdrs/util.py file replace my installed scdrs/util.py file? Maybe I need to modify the util.py according to your method?
you could git clone the latest version to use this updated scdrs
Hi, thanks for the discussion! I have a follow-up question.
The output dataframe of the "_scdrs.method.downstream_groupanalysis" function has the columns of "assoc_mcp" and "assoc_mcz". Are the values in these two columns already corrected for multiple comparison, as indicated by the abbreviation "mc(multiple comparison)"? Or did I interpret the abbreviation incorrectly?
no. "mc" refers to "Monte Carlo". you will still need to apply the FDR control procedure.
no. "mc" refers to "Monte Carlo". you will still need to apply the FDR control procedure.
Oh Thank you so much!
I also want my p value to be adjusted at the cell type level only, instead of for each cell-type-disease combination. I understand this is not a bug and the rationale behind the original paper, for using cell-type disease combination for correcting multiple testing. I think both ways can be justified.
Therefore, I found a way to customize the multi-disease/phenotype graph, which demonstrates the fdr corrected at cell type level only. This requires you to serve the the _df_fdrprop, _df_assocfdr, and _df_heterofdr separately, instead of using the _dict_dfstats parameter.
First, I calculated the BH-adjusted p value for each phenotype:
## Association FDR
for trait in dict_df_stats.keys():
dict_df_stats[trait]['assoc_fdr']=multipletests(dict_df_stats[trait]['assoc_mcp'], method='fdr_bh')[1]
## Heterogeneous FDR
for trait in dict_df_stats.keys():
dict_df_stats[trait]['hetero_fdr']=multipletests(dict_df_stats[trait]['hetero_mcp'], method='fdr_bh')[1]
Then, I defined the df_fdr_prop, df_assoc_fdr, and df_hetero_fdr separately, for which I also modified the fdr df to frd<0.2.
fig, ax = scdrs.util.plot_group_stats(## dataframe of proportion of cells with FDR < 0.1: modified to 0.2
df_fdr_prop= pd.DataFrame({trait: dict_df_stats[trait]['n_fdr_0.2']/dict_df_stats[trait]['n_cell']
for trait in dict_df_stats.keys()}).transpose(),
## dataframe of group-level association statistics
df_assoc_fdr= pd.DataFrame({trait: dict_df_stats[trait]['padj']
for trait in dict_df_stats.keys()}).transpose(),
## dataframe of group-level heterogeneity statistics
df_hetero_fdr= pd.DataFrame({trait: dict_df_stats[trait]['hetero_mcp']
for trait in dict_df_stats.keys()}).transpose(),
## threshold for FDR correction of cell type-level mean association statistics
assoc_fdr_threshold =0.2,
## Plotting parameters
plot_kws={"cb_vmax": 0.1,### max value of the color bar
"cb_fraction":0.1### fraction of colorbar
}
)
Thank you again for this great package and meaningful discussion.
Hello, when I looked at the enrichment of AD symptoms in my data alone, I found that AD was enriched in MG. But when I looked at multiple diseases at the same time, it was not significant. Please tell me how to solve this problem.