macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
713 stars 268 forks source link

What is the difference between "absolute peak summit" and "summit position" in narrowPeak format? #617

Open mcsimenc opened 9 months ago

mcsimenc commented 9 months ago

Hi, I want to understand the difference between 5 and 10 in macs3 narrowPeak output. They are different, but column 10, the "relative summit position to peak start", corresponds to what is found in the summits.bed file. In IGV, field 10 is labeled as the "peak", and field 5 is labeled as "score". Here is an example:

summits.bed

Chr1    13972   13973   peak_1  12.7509

narrowPeak

Chr1    13824   14257   peak_1  127 .   4.86034 15.5595 12.7509 148

148 = 13972 - 13824

127 = ?

Thanks so much

BED field descriptions drawn from: https://github.com/macs3-project/MACS/blob/master/docs/callpeak.md

taoliu commented 9 months ago

@mcsimenc the 5th column, according to definition of narrowPeak represent the peak score. In MACS, the score is the integar form of 10 x -log10(qvalue) (9th column). In the callpeak.md file you refer to, you can find this description:

NAME_peaks.narrowPeak is BED6+4 format file which contains the peak locations together with peak summit, p-value, and q-value. You can load it to the UCSC genome browser. Definition of some specific columns are:

5th: integer score for display. It's calculated as int(-10log10pvalue) or int(-10log10qvalue) depending on whether -p (pvalue) or -q (qvalue) is used as score cutoff. Please note that currently, this value might be out of the [0-1000] range defined in UCSC ENCODE narrowPeak format. You can let the value saturated at 1000 (i.e. p/q-value = 10^-100) by using the following 1-liner awk: awk -v OFS="\t" '{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak