kundajelab / DMSO

5 stars 3 forks source link

TODO for paper #1: Chromatin state distribution plot by cell cycle phase & treatment status #10

Open annashcherbina opened 6 years ago

annashcherbina commented 6 years ago

goal is to check if proportion of peaks in each ChromHMM chromatin state shifts with cell cycle phase & DMSO treatment.

annashcherbina commented 6 years ago

chromatinstatedistributionforpeaks normalized chromatinstatedistributionforpeaks notnormalized

annashcherbina commented 6 years ago

I intersected the Chip-seq peaks with differential ATAC-seq peaks, and annotated by ATAC-seq peak chromatin state (15 state model):

chromatin_state_chipseq_vs_atac_seq.xlsx

For the Active TSS sites, we are seeing an interested dynamic for H3K27ac & H3K27me3 marks -- where one goes up, the other goes down. This makes sense.

However, most of the ATAC-seq peaks appear in the Quiescent state, and do not intersect any H3K27ac, H3K4me3, H3k27me3 peaks -- this makes sense given that the state is quiescent.

BUT this doesn't make sense given Homer & Browser results. Homer returns a ton of significant motifs in the quiescent state peaks: (see slides #26 - #29 ) https://docs.google.com/presentation/d/11tsBKPbJ3LMEk1kYkqe2A_xl6FJFoUmooEcWJDWRf5s/edit#slide=id.g2b0ac58bc2_0_5

Additionally, the peaks truly look differential on the browser. For example: http://epigenomegateway.wustl.edu/browser/?genome=hg19&session=Ih4Zot1FHo&statusId=1160268760 (i have many other examples like this, can post if of interest)

annashcherbina commented 6 years ago

I have checked the 25 state model, 50 state model, and Jason Ernst's 10-factor/15-state model to determine how many of the quiescent peaks are actually insulators that contain CTCF binding sites. The main takeaway is that CTCF accounts for roughly 25% of these peaks, but the remaining 75% remain inconclusive -- they show up as "Quiescent" (i.e. no marks present) for the 10-factor & 25-state models, but seem to have H3K27me3 (and no other marks) in the 50-state model. This makes little sense, as the 25 state model also profiles H3K27me3, but there are few peaks in the corresponding state for the 25-state model.

For example: image image image

image image image

image image image

The peaks in the Quiescent state have the following motif enrichments from HOMER: image (the strong presence of CTCF is explained by the 10-factor model, but other motifs are hard to justify).

annashcherbina commented 6 years ago

Browser session with ATAC-seq data, Chrom-HMM 15,18,25 state models, TF Chip-seq for 45 marks.

http://epigenomegateway.wustl.edu/browser/?genome=hg19&session=iAn8ZGdVAb&statusId=1875999043

it appears from looking at the 25 state model that many of the peaks that are showing up in the "quiescent" state are actually partially overlapping a weak enhancer. I am going to check if the issue has to do with bedtools intersect not behaving as expected -- also this is the H1 cell line, but we are using H9 for the project so the overlap may not be exact with the weak enhancer regions?

akundaje commented 6 years ago

Expand your peaks by 400 bp on each side because often the accessible site is in a nucleosome free region with no histone marks.

Anshul.

On Nov 30, 2017 7:18 AM, "annashcherbina" notifications@github.com wrote:

Browser session with ATAC-seq data, Chrom-HMM 15,18,25 state models, TF Chip-seq for 45 marks.

http://epigenomegateway.wustl.edu/browser/?genome=hg19& session=iAn8ZGdVAb&statusId=1875999043

it appears from looking at the 25 state model that many of the peaks that are showing up in the "quiescent" state are actually partially overlapping a weak enhancer. I am going to check if the issue has to do with bedtools intersect not behaving as expected -- also this is the H1 cell line, but we are using H9 for the project so the overlap may not be exact with the weak enhancer regions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/DMSO/issues/10#issuecomment-348218833, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EZWZXdlDGgM_eCoXKMAy36TptJwNks5s7scygaJpZM4QNB8V .

annashcherbina commented 6 years ago

I added 400 bp flanks to each peak and assigned each peak that overlapped an enhancer and a quiescent state region to the enhancer state.

This helped to clean up the data. The 25 State model yields the following distribution:

25state_counts

The 10-factor model yields:

10factor_counts

We are now left with only 75 peaks that are fully in the quiescent state, as determined by 50 state, 25 state, and 10-factor models.

Running homer on just these 75 peaks yields an enrichment for Sox2, Sox3, and Sox10: image

These Sox motifs have been shown to play a role in neuronal differentiation from progenitor cells: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243056/

I have been unable to find Chip-seq datasets for these motifs in H1 (or H9), so cannot verify for sure, but I think our hypothesis for purposes of the paper is that the differential peaks in quiescent states can be explained by CTCF insulators and Sox motifs that play a role in neuronal differentiation.