Open danli349 opened 1 year ago
Hi @danli349 ,
Thank you for trying ChIPpeakAnno to annotate your data. You may want to remove the duplicated ranges with reduce
function or set the parameter ignore.strand
to false. Let me know if it does not work.
Jianhong.
Hi @jianhong
Thanks for help.
The reduce
function works.
library(ChIPpeakAnno)
gr1 <- toGRanges("macs2/MSKPCa3_STAT1_IGO_11795_B_1_peaks.narrowPeak", format="narrowPeak", header=FALSE)
gr1$score <- as.numeric(gr1$score)
## one can also try import from rtracklayer
gr2 <- toGRanges("macs2/MSKPCa3_STAT3_IGO_11795_B_2_peaks.narrowPeak", format="narrowPeak", header=FALSE)
## must keep the class exactly same as gr1$score, i.e., numeric.
gr2$score <- as.numeric(gr2$score)
gr1_reduce <- reduce(gr1, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=FALSE,
with.inframe.attrib=FALSE, ignore.strand=FALSE)
gr2_reduce <- reduce(gr2, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=FALSE,
with.inframe.attrib=FALSE, ignore.strand=FALSE)
ol <- findOverlapsOfPeaks(gr1_reduce, gr2_reduce)
## add metadata (mean of score) to the overlapping peaks
ol <- addMetadata(ol, colNames="score", FUN=mean)
But the reduce
function removed the score
column
Error in addMetadata(ol, colNames = "score", FUN = mean) :
colNames: score does not exist in the metadata of all the list.
The downstream commands can succeed.
ol$peaklist[["gr1_reduce///gr2_reduce"]][1:2]
makeVennDiagram(ol, fill=c("#009E73", "#F0E442"), # circle fill color
col=c("#D55E00", "#0072B2"), #circle border color
cat.col=c("#D55E00", "#0072B2")) # label color, keep same as circle border color
How should I maintain the other columns of GRanges object
when using reduce
?
GRanges object with 15207 ranges and 5 metadata columns:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
MSKPCa3_STAT3_IGO_11795_B_2_peak_4 chr1 1310605-1310905 * | 15
MSKPCa3_STAT3_IGO_11795_B_2_peak_5 chr1 1368603-1368949 * | 344
MSKPCa3_STAT3_IGO_11795_B_2_peak_6 chr1 1510085-1510407 * | 67
MSKPCa3_STAT3_IGO_11795_B_2_peak_7 chr1 1617671-1617933 * | 88
MSKPCa3_STAT3_IGO_11795_B_2_peak_8 chr1 2425742-2426015 * | 185
... ... ... ... . ...
MSKPCa3_STAT3_IGO_11795_B_2_peak_15088 chrY 27789252-27789553 * | 24
MSKPCa3_STAT3_IGO_11795_B_2_peak_15089 chrY 27963502-27963771 * | 46
MSKPCa3_STAT3_IGO_11795_B_2_peak_15090a chrY 28020328-28020862 * | 27
MSKPCa3_STAT3_IGO_11795_B_2_peak_15090b chrY 28020328-28020862 * | 25
MSKPCa3_STAT3_IGO_11795_B_2_peak_15091 chrY 28025034-28025744 * | 14
signalValue pValue qValue peak
<numeric> <numeric> <numeric> <integer>
MSKPCa3_STAT3_IGO_11795_B_2_peak_4 2.11121 4.29917 1.59581 68
MSKPCa3_STAT3_IGO_11795_B_2_peak_5 6.93809 38.75010 34.49640 166
MSKPCa3_STAT3_IGO_11795_B_2_peak_6 3.68591 10.05500 6.77088 214
MSKPCa3_STAT3_IGO_11795_B_2_peak_7 3.50257 12.30720 8.88024 142
MSKPCa3_STAT3_IGO_11795_B_2_peak_8 6.32404 22.41300 18.54680 146
... ... ... ... ...
MSKPCa3_STAT3_IGO_11795_B_2_peak_15088 2.93334 5.29487 2.45271 98
MSKPCa3_STAT3_IGO_11795_B_2_peak_15089 4.03285 7.78775 4.68522 146
MSKPCa3_STAT3_IGO_11795_B_2_peak_15090a 2.90368 5.58556 2.70768 148
MSKPCa3_STAT3_IGO_11795_B_2_peak_15090b 2.65053 5.38013 2.52672 335
MSKPCa3_STAT3_IGO_11795_B_2_peak_15091 2.95052 4.05748 1.40397 536
Thanks
you may want to do some research about the parameter with.revmap
of reduce
Neither with.revmap=TRUE
or with.revmap=FALSE
can maintain the score
column.
> STAT1_reduce <- reduce(STAT1, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=TRUE,
+ with.inframe.attrib=FALSE, ignore.strand=FALSE)
> STAT1_reduce
GRanges object with 29 ranges and 1 metadata column:
seqnames ranges strand | revmap
<Rle> <IRanges> <Rle> | <IntegerList>
[1] chr1 45196389-45196653 * | 1
[2] chr1 200457000-200457373 * | 2
[3] chr1 210547226-210547534 * | 3
[4] chr11 61158724-61158989 * | 4
[5] chr12 124573941-124574275 * | 5
... ... ... ... . ...
[25] chr5 139666182-139666431 * | 26
[26] chr6 113441601-113441881 * | 27
[27] chr8 126525375-126525654 * | 28
[28] chr8 128227353-128227673 * | 29
[29] chrX 67872864-67873167 * | 30
-------
seqinfo: 14 sequences from an unspecified genome; no seqlengths
> STAT1_reduce <- reduce(STAT1, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=FALSE,
+ with.inframe.attrib=FALSE, ignore.strand=FALSE)
> STAT1_reduce
GRanges object with 29 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 45196389-45196653 *
[2] chr1 200457000-200457373 *
[3] chr1 210547226-210547534 *
[4] chr11 61158724-61158989 *
[5] chr12 124573941-124574275 *
... ... ... ...
[25] chr5 139666182-139666431 *
[26] chr6 113441601-113441881 *
[27] chr8 126525375-126525654 *
[28] chr8 128227353-128227673 *
[29] chrX 67872864-67873167 *
-------
seqinfo: 14 sequences from an unspecified genome; no seqlengths
The revmap
is the index number of the original GRanges. You can use that index to traceback your score column. eg:
score <- vapply(STAT1_reduce$revmap, FUN=function(.id) STAT1[.id]$score[1], FUN.VALUE=numeric(1L))
@jianhong Thanks a lot.
Hello:
Can you please let me know how to fix this error?
Thanks