martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
114 stars 16 forks source link

Issue with plot using scdrs.util.plot_group_stats #97

Closed NaomiHuntley closed 2 months ago

NaomiHuntley commented 3 months ago

Hello, I am working with some GWAS data along with the TMS Facs mouse single cell dataset. I am trying to create the cell heterogeneity plot based on the tutorial, however my plot ends up with a lot cell types even after I try to filter by fdr. Can you please help me understand what I am doing wrong. Maybe I need to agglomerate by cell ontology level? Below is the code I am using, the resulting data I am trying to plot, and the plot. Thank you in advance for the support!

Load and prep data

adata = sc.read_h5ad("processed_data.h5ad") df_gs1 = pd.read_csv("lop_geneset.gs", sep="\t", index_col=0)

df_gs = df_gs1.loc[ ["GWAS__Akk.munge.gs" ] ]

dict_score = { trait: pd.read_csv(f"/storage/group/computeScore/{trait}.full_score.gz", sep="\t", index_col=0) for trait in df_gs.index }

for trait in dict_score: adata.obs[trait] = dict_score[trait]["norm_score"]

dict_df_stats = { trait: pd.read_csv(f"/storage/group//downstream/{trait}.scdrs_group.cell_ontology_class", sep="\t", index_col=0) for trait in ["GWAS_Akk.munge.gs"] }

Filter each DataFrame in the dictionary based on a condition

filtered_dict_df_stats = {}

for trait, df in dict_df_stats.items(): filtered_df = df[df['n_fdr_0.1'] < 0.1] if not filtered_df.empty: filtered_dict_df_stats[trait] = filtered_df

Plot

fig, ax = scdrs.util.plot_group_stats( dict_df_stats=filtered_dict_df_stats, plot_kws={ "cb_vmax": 0.2, # Customize as needed "cb_fraction": 0.12 } )

Screenshot 2024-08-12 at 2 11 56 PM Screenshot 2024-08-12 at 2 11 41 PM

KangchengHou commented 3 months ago

@NaomiHuntley I believe you are referring to the large number of columns on the x-axis. Currently these columns are defined cell_ontology_class - if you want to reduce the number of columns, you can create a new column in adata.obs aggregating cell_ontology_class (e.g., creating cell_ontology_superclass) and then run the analysis with that column.

NaomiHuntley commented 2 months ago

Thank you for the clarification!