lhqing / whole_mouse_brain

misc code for whole mouse brain analysis
MIT License
5 stars 0 forks source link

Construct TF-CRE-Gene Regulon by combine all genomic features #24

Open lhqing opened 2 years ago

lhqing commented 2 years ago

TF-CRE-Gene Regulon

  1. TF - CRE link See motif enrichment analysis below
    • [x] Scan motifs ecker-hanqing-analysis/220924-dmr-motif-scan cemba.get_mc_dmr_ds(add_motif=True) # to get DMR RegioDS with motif scan matrix
    • [x] perform motif enrichment for each cluster and meaningful cluster groups
    • [x] Create graph adjacency matrix for TF-CRE
  2. Gene - CRE link
    • [ ] correlation
    • [ ] GBT model predictability
    • [x] 3C physical approximation #16
    • [ ] Create graph adjacency matrix for Gene-CRE
  3. Gene - TF link
    • [x] correlation
    • [ ] GBT model predictability
    • [ ] Create adjacency matrix for Gene-TF
  4. Construct Regulon
    • [ ] link eRegulon by raw adjacency matrix
    • [ ] filter by GSEA Leading edge analysis
    • [ ] quantification in each cluster
    • [ ] QC regulon
lhqing commented 2 years ago

Motif Enrichment Analysis

  1. Use pycistarget motif collection (4096 motif / motif clusters with a mouse TF annotation) to scan DMR regions (slop -b 150), resulting a motif-by-dmr dataset, can be loaded with DMR RegionDS
  2. Determine region sets to run motif enrichment analysis min_cov = 5; quantile = 0.1 a. for each sample, with cov > min_cov filter, choose top and bottom quantile DMR as the sample-based hypo- and hyper-DMR, hypo-DMR status -1, hyper-DMR status 1, no status 0 b. for each dmr, with cov > min_cov filter, choose top and bottom quantile DMR as the sample-based hypo- and hyper-DMR, hypo-DMR status -1, hyper-DMR status 1, no status 0 c. combine hypo- and hyper- status from a and b, only save consistent status when a == b, otherwise save 0 d. sample can be all the L4Regions, and the SubCluster labels
  3. For each set identified from 2, run motif enrichment with DME and CisTarget methods, take the consistent hypo- or hyper- enriched motifs
  4. Combine all results to get cistrome of all TFs by all regions

After complete comments

CisTarget Hypo enrichment show good cell type specificity, hyper has little enrichment; DEM does not work well, motifs uniformly enriched, no clear cell type specificity. The final motif hits will be cistarget-hypo-DMR motif hits.

TF Example

tf_nes = dmr_ds["tf_nes"].to_pandas() # use NES per TF, agg from max(motifs) download cemba.get_gene_fracs() # TF mCH frac download

How to get results

dmr_ds = cemba.get_mc_dmr_ds(add_motif_hits=True)

# final TF-by-DMR hits (or adj matrix for the TF DMR graph)
dmr_ds['tf_gene_hits']
image
rachelzeng98 commented 2 years ago

TF_GENE correlation

  1. get l4region TF and gene expression matrix:
  1. correlation: 1838 TF, 4673 l4region and 30370 gene

How is the result

  1. histplot showing pearsoncorrelation value distribution

    Screen Shot 2022-10-08 at 18 07 07
  2. scatterplot showing TF and Gene expression

    Screen Shot 2022-10-08 at 18 08 03
  3. TF and Gene expression plot on l4region clusters

    Screen Shot 2022-10-08 at 18 09 22

How to get results combine all results to zarr: ecker-rachel-analysis/tf_gene/TF_Gene_Corelation.zarr

Screen Shot 2022-10-08 at 18 05 49