kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
159 stars 81 forks source link

Steps between pipeline and diffE #129

Open raboul101 opened 6 years ago

raboul101 commented 6 years ago

I have another question about preferred ways to get from pipeline results to diffE analysis.

You previously commented, in ref. to differential peak 'expression': -- You can use the union of naive overlap peaks across all conditions as your complete set of peaks. Quantify read counts in each peak in each of the replicates and treatments. --

To create a union of naive overlaps, I took the naive_overlap.filt.narrowPeak.bb files from /out/peak/macs2/overlap/optimal_set. I converted these to .bed using UCSC bigBedToBed. Then I used bedops --everything to create the union.bed file. I intend to quantify reads using featureCounts, but I'm having trouble in converting the merged .bed to a .gtf or .saf in order to count the reads.

Do you have a preferred way of doing this?

vervacity commented 5 years ago

Hello,

We prefer to use the tagAlign files, which are BED formatted files of the reads (ie, each line in the file is a different read). We then use bedtools to intersect the reads with the union file, using the -c option, to count how many reads overlap with each region. This keeps things generally simple for the most part.

Also, if you haven't had a chance yet, we have a pipelines google group that may also be helpful to you: https://groups.google.com/forum/#!forum/klab_genomic_pipelines_discuss