jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
185 stars 27 forks source link

Differences in Peaks #60

Closed genomebio-nk closed 3 years ago

genomebio-nk commented 4 years ago

Hi all,

I am fairly new to bioinformatics and have been using this wonderful program to analyze my ATAC-seq data. As a preface, I used STAR for aligning and sambamba to sort by queryname. Afterwards, I also created bigWigs from my bedgraphs (via bedtools) using the UCSC program, bedGraphtobigWig, for visualization on IGV, and subsequently Genrich for peak analysis. However, it seems that the peaks shown via my bigWigs does not line up with the peaks identified by Genrich, i.e. in some cases individual peaks from bigWigs are identified as one big peak in my narrowpeak file, or it identifies a peak where there is non shown in my bigWig (see screenshot attached).

Here is an example of the code I ran for Genrich:

Genrich -t /home/scratch/sorted_files/sample_1_trimmed_Aligned.sorted.bam,/home/scratch/sorted_files/sample_2_trimmed_Aligned.sorted.bam,/home/scratch/sorted_files/sample_3_trimmed_Aligned.sorted.bam -o /home/scratch/genrich_output/sample_ATAC_genrich.narrowPeak -j -y -r -e chrM -v -E /home/scratch/mm10.blacklist.bed

And this is what I saw in the log file:

BAM records analyzed: 118133315 To skipped refs: 1282466 (chrM) Paired alignments: 0 Unpaired alignments: 116850849 secondary alns: 18627375 PCR duplicates -- Paired aln sets: 0 duplicates: 0 (0.0%) Discordant aln sets: 0 duplicates: 0 (0.0%) Singleton aln sets: 51383350 duplicates: 16765146 (32.6%) Fragments analyzed: 34612009 Full fragments: 0 Half fragments: 34612009 (from unpaired alns) ATAC-seq cut sites: 34612009 (expanded to length 100bp)

Any help is greatly appreciated!

jsh58 commented 4 years ago

Thanks for the question. There is no screenshot attached.

If the bedGraph file wasn't generated by Genrich, that could be why the bigWig "peaks" don't match the peaks identified by Genrich. As explained here, in ATAC-seq mode Genrich interprets intervals around each fragment's 5' ends. Genrich can produce a "bedgraph-ish" file that lists the pileup values it calculated via -f <file> or -k <file> as described in this section; since you have multiple replicates, you should use -k <file>.

genomebio-nk commented 4 years ago

I apologize, here is the screenshot attached!

Screen Shot 2020-08-06 at 3 35 05 PM

I will try that!

So, can these bedgraph-ish files be used for creating peaks similar to the ones (from the bigWig files) in my screenshot?

I understand Genrich will merge two intervals into a single interval if they are close enough, is there any way to disable this feature?

jsh58 commented 4 years ago

You can check the description of the -g <int> parameter in this section of the README.