Open lhqing opened 2 years ago
min_cov = 5; quantile = 0.1
a. for each sample
, with cov > min_cov
filter, choose top and bottom quantile
DMR as the sample-based hypo- and hyper-DMR, hypo-DMR status -1
, hyper-DMR status 1
, no status 0
b. for each dmr
, with cov > min_cov
filter, choose top and bottom quantile
DMR as the sample-based hypo- and hyper-DMR, hypo-DMR status -1
, hyper-DMR status 1
, no status 0
c. combine hypo- and hyper- status from a and b, only save consistent status when a == b, otherwise save 0
d. sample
can be all the L4Regions, and the SubCluster labelsCisTarget Hypo enrichment show good cell type specificity, hyper has little enrichment; DEM does not work well, motifs uniformly enriched, no clear cell type specificity. The final motif hits will be cistarget-hypo-DMR motif hits.
tf_nes = dmr_ds["tf_nes"].to_pandas() # use NES per TF, agg from max(motifs)
cemba.get_gene_fracs() # TF mCH frac
dmr_ds = cemba.get_mc_dmr_ds(add_motif_hits=True)
# final TF-by-DMR hits (or adj matrix for the TF DMR graph)
dmr_ds['tf_gene_hits']
TF_GENE correlation
How is the result
histplot showing pearsoncorrelation value distribution
scatterplot showing TF and Gene expression
TF and Gene expression plot on l4region clusters
How to get results combine all results to zarr: ecker-rachel-analysis/tf_gene/TF_Gene_Corelation.zarr
TF-CRE-Gene Regulon
ecker-hanqing-analysis/220924-dmr-motif-scan
cemba.get_mc_dmr_ds(add_motif=True) # to get DMR RegioDS with motif scan matrix