jianhong / ChIPpeakAnno

11 stars 4 forks source link

`findOverlapsOfPeaks(gr1, gr2)` Error in FUN(X[[i]], ...) : Inputs contains duplicated ranges. please recheck your inputs. #25

Open danli349 opened 1 year ago

danli349 commented 1 year ago

Hello:

library(ChIPpeakAnno)

gr1 <- toGRanges("macs2/MSKPCa3_STAT1_IGO_11795_B_1_peaks.narrowPeak", format="narrowPeak", header=FALSE)
gr1
## one can also try import from rtracklayer

gr2 <- toGRanges("macs2/MSKPCa3_STAT3_IGO_11795_B_2_peaks.narrowPeak", format="narrowPeak", header=FALSE)
## must keep the class exactly same as gr1$score, i.e., numeric.
gr2$score <- as.numeric(gr2$score)
gr2

ol <- findOverlapsOfPeaks(gr1, gr2)

Can you please let me know how to fix this error?

Error in FUN(X[[i]], ...) : Inputs contains duplicated ranges. 
             please recheck your inputs.

Thanks

jianhong commented 1 year ago

Hi @danli349 , Thank you for trying ChIPpeakAnno to annotate your data. You may want to remove the duplicated ranges with reduce function or set the parameter ignore.strand to false. Let me know if it does not work.

Jianhong.

danli349 commented 1 year ago

Hi @jianhong

Thanks for help. The reduce function works.

library(ChIPpeakAnno)

gr1 <- toGRanges("macs2/MSKPCa3_STAT1_IGO_11795_B_1_peaks.narrowPeak", format="narrowPeak", header=FALSE)
gr1$score <- as.numeric(gr1$score)
## one can also try import from rtracklayer

gr2 <- toGRanges("macs2/MSKPCa3_STAT3_IGO_11795_B_2_peaks.narrowPeak", format="narrowPeak", header=FALSE)
## must keep the class exactly same as gr1$score, i.e., numeric.
gr2$score <- as.numeric(gr2$score)

gr1_reduce <- reduce(gr1, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=FALSE,
                     with.inframe.attrib=FALSE, ignore.strand=FALSE)
gr2_reduce <- reduce(gr2, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=FALSE,
                     with.inframe.attrib=FALSE, ignore.strand=FALSE)

ol <- findOverlapsOfPeaks(gr1_reduce, gr2_reduce)
## add metadata (mean of score) to the overlapping peaks

ol <- addMetadata(ol, colNames="score", FUN=mean)

But the reduce function removed the score column

Error in addMetadata(ol, colNames = "score", FUN = mean) : 
  colNames:  score does not exist in the metadata of all the list.

The downstream commands can succeed.

ol$peaklist[["gr1_reduce///gr2_reduce"]][1:2]

makeVennDiagram(ol, fill=c("#009E73", "#F0E442"), # circle fill color
                col=c("#D55E00", "#0072B2"), #circle border color
                cat.col=c("#D55E00", "#0072B2")) # label color, keep same as circle border color

How should I maintain the other columns of GRanges object when using reduce?

GRanges object with 15207 ranges and 5 metadata columns:
                                          seqnames            ranges strand |     score
                                             <Rle>         <IRanges>  <Rle> | <numeric>
       MSKPCa3_STAT3_IGO_11795_B_2_peak_4     chr1   1310605-1310905      * |        15
       MSKPCa3_STAT3_IGO_11795_B_2_peak_5     chr1   1368603-1368949      * |       344
       MSKPCa3_STAT3_IGO_11795_B_2_peak_6     chr1   1510085-1510407      * |        67
       MSKPCa3_STAT3_IGO_11795_B_2_peak_7     chr1   1617671-1617933      * |        88
       MSKPCa3_STAT3_IGO_11795_B_2_peak_8     chr1   2425742-2426015      * |       185
                                      ...      ...               ...    ... .       ...
   MSKPCa3_STAT3_IGO_11795_B_2_peak_15088     chrY 27789252-27789553      * |        24
   MSKPCa3_STAT3_IGO_11795_B_2_peak_15089     chrY 27963502-27963771      * |        46
  MSKPCa3_STAT3_IGO_11795_B_2_peak_15090a     chrY 28020328-28020862      * |        27
  MSKPCa3_STAT3_IGO_11795_B_2_peak_15090b     chrY 28020328-28020862      * |        25
   MSKPCa3_STAT3_IGO_11795_B_2_peak_15091     chrY 28025034-28025744      * |        14
                                          signalValue    pValue    qValue      peak
                                            <numeric> <numeric> <numeric> <integer>
       MSKPCa3_STAT3_IGO_11795_B_2_peak_4     2.11121   4.29917   1.59581        68
       MSKPCa3_STAT3_IGO_11795_B_2_peak_5     6.93809  38.75010  34.49640       166
       MSKPCa3_STAT3_IGO_11795_B_2_peak_6     3.68591  10.05500   6.77088       214
       MSKPCa3_STAT3_IGO_11795_B_2_peak_7     3.50257  12.30720   8.88024       142
       MSKPCa3_STAT3_IGO_11795_B_2_peak_8     6.32404  22.41300  18.54680       146
                                      ...         ...       ...       ...       ...
   MSKPCa3_STAT3_IGO_11795_B_2_peak_15088     2.93334   5.29487   2.45271        98
   MSKPCa3_STAT3_IGO_11795_B_2_peak_15089     4.03285   7.78775   4.68522       146
  MSKPCa3_STAT3_IGO_11795_B_2_peak_15090a     2.90368   5.58556   2.70768       148
  MSKPCa3_STAT3_IGO_11795_B_2_peak_15090b     2.65053   5.38013   2.52672       335
   MSKPCa3_STAT3_IGO_11795_B_2_peak_15091     2.95052   4.05748   1.40397       536

Thanks

jianhong commented 1 year ago

you may want to do some research about the parameter with.revmap of reduce

danli349 commented 1 year ago

Neither with.revmap=TRUE or with.revmap=FALSE can maintain the score column.

> STAT1_reduce <- reduce(STAT1, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=TRUE,
+                      with.inframe.attrib=FALSE, ignore.strand=FALSE)
> STAT1_reduce
GRanges object with 29 ranges and 1 metadata column:
       seqnames              ranges strand |        revmap
          <Rle>           <IRanges>  <Rle> | <IntegerList>
   [1]     chr1   45196389-45196653      * |             1
   [2]     chr1 200457000-200457373      * |             2
   [3]     chr1 210547226-210547534      * |             3
   [4]    chr11   61158724-61158989      * |             4
   [5]    chr12 124573941-124574275      * |             5
   ...      ...                 ...    ... .           ...
  [25]     chr5 139666182-139666431      * |            26
  [26]     chr6 113441601-113441881      * |            27
  [27]     chr8 126525375-126525654      * |            28
  [28]     chr8 128227353-128227673      * |            29
  [29]     chrX   67872864-67873167      * |            30
  -------
  seqinfo: 14 sequences from an unspecified genome; no seqlengths
> STAT1_reduce <- reduce(STAT1, drop.empty.ranges=FALSE, min.gapwidth=1L, with.revmap=FALSE,
+                      with.inframe.attrib=FALSE, ignore.strand=FALSE)
> STAT1_reduce
GRanges object with 29 ranges and 0 metadata columns:
       seqnames              ranges strand
          <Rle>           <IRanges>  <Rle>
   [1]     chr1   45196389-45196653      *
   [2]     chr1 200457000-200457373      *
   [3]     chr1 210547226-210547534      *
   [4]    chr11   61158724-61158989      *
   [5]    chr12 124573941-124574275      *
   ...      ...                 ...    ...
  [25]     chr5 139666182-139666431      *
  [26]     chr6 113441601-113441881      *
  [27]     chr8 126525375-126525654      *
  [28]     chr8 128227353-128227673      *
  [29]     chrX   67872864-67873167      *
  -------
  seqinfo: 14 sequences from an unspecified genome; no seqlengths
jianhong commented 1 year ago

The revmap is the index number of the original GRanges. You can use that index to traceback your score column. eg:

score <- vapply(STAT1_reduce$revmap, FUN=function(.id) STAT1[.id]$score[1], FUN.VALUE=numeric(1L))
danli349 commented 1 year ago

@jianhong Thanks a lot.