Downstream differential analysis

loganminhdang commented 2 years ago

Thank you for creating the helpful tutorial. I have a question regarding the differential analysis section of the tutorial, in which a count matrix is generated. I realized from both my count table that it does not have a corresponding gene name, but instead only numeric order. I suppose that to relate peaks called by SEACR/peak-calling software to genes, I will need to use Rsubread or similar packages; however, from my impression, to use the Rsubread package, I will need .narrowpeak files from MACS2, which is not acquirable from SEACR. Then, do you recommend that I use MACS2 to perform the peak calling, or is there an alternative method to connect peaks to specific genes/TSSs? Thank you!

yezhengSTAT commented 2 years ago

Hello, Yes, the rows of the count matrix correspond to the peak regions from "masterPeak". However, it does not necessarily mean that you have to use the peak region. If your target is TSS neighborhood or promoter region, you can replace the masterPeak by the TSS +/-500bp. If you still want to use the peak region and want to find the closest genes, you can refer to a few R function like "distance" from https://web.mit.edu/~r/current/arch/i386_linux26/lib/R/library/GenomicRanges/html/nearest-methods.html to assign the closest genes within certain ranges to each peak. As for the peak caller, SEACR is generally recommended if the peak regions are expected to be long and continuous such as H3K27me3. You may also use MACS2 if the peaks visualized on the genome browser look legit.

Thanks, Ye

loganminhdang commented 2 years ago

I managed to do peak annotation based on your advice. Many thanks!

MatteoFiumara commented 7 months ago

Hi @loganminhdang ! Could you share how to managed to do peak annotation? I am a bit naive with sequencing analysis and I am struggling to figure out how to do it

yezhengSTAT / CUTTag_tutorial

Downstream differential analysis #6