jianhong / ChIPpeakAnno

11 stars 4 forks source link

findOverlapsOfPeaks error: Inputs contain duplicated ranges #1

Closed AdrijaK closed 4 years ago

AdrijaK commented 4 years ago

I am having an issue finding overlaps between two GRanges objects. First GRanges object consists of peaks found by DiffBind and transformed to GRanges

peaks example without metadata columns:

peaks[1:6]
GRanges object with 6 ranges and 36 metadata columns:
         seqnames          ranges strand | 
            <Rle>       <IRanges>  <Rle> |       
  Peak_1     chr1 3105094-3105294      * |
  Peak_2     chr1 3221027-3221227      * |
  Peak_3     chr1 3400310-3400510      * |
  Peak_4     chr1 3426765-3426965      * |
  Peak_5     chr1 3433906-3434106      * | 
  Peak_6     chr1 3445765-3445965      * | 

The second GRanges object is derived from gencode gtf

# load .gtf into txdb object
txdb = GenomicFeatures::makeTxDbFromGFF(
  "gencode.vM23.annotation.gtf",
  format="gtf", 
  dataSource="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.annotation.gtf.gz",
  organism="Mus musculus",
  circ_seqs = "chrM"
  )

# extract transcript TSS 
GRanges_TSS = compEpiTools::TSS(txdb)

I try overlapping both peak sets, explicitly selecting unique ranges for both GRanges objects:

overlaps = findOverlapsOfPeaks(unique(peaks[1:6]), unique(GRanges_TSS))

And I get an error:

Error in FUN(X[[i]], ...) : Inputs contains duplicated ranges. 
             please recheck your inputs.

Would it be possible to pinpoint the cause of this? Thank you!

jianhong commented 4 years ago

Hi Adrija, Thank you for trying ChIPpeakAnno. In your case you are extracting transcript TSS. findOverlapsOfPeaks are supposing that all the input should be unique peaks. Depend on the question you want to ask, you may want select annotatePeakInBatch to find out how many of your peaks overlapping with TSS. Just in case you want to treat TSS regions as peaks, you can try: TSS.rd <- reduce(GRanges_TSS, ignore.strand=TRUE) ol <- findOverlapsOfPeaks(peaks, TSS.rd)

Let me know what you think.

On Fri, Jan 31, 2020 at 6:13 AM Adrija Kalvisa notifications@github.com wrote:

I am having an issue finding overlaps between two GRanges objects. First GRanges object consists of peaks found by DiffBind and transformed to GRanges

peaks example without metadata columns:

peaks[1:6]

GRanges object with 6 ranges and 36 metadata columns: seqnames ranges strand |

| Peak_1 chr1 3105094-3105294 * | Peak_2 chr1 3221027-3221227 * | Peak_3 chr1 3400310-3400510 * | Peak_4 chr1 3426765-3426965 * | Peak_5 chr1 3433906-3434106 * | Peak_6 chr1 3445765-3445965 * | The second GRanges object is derived from gencode gtf # load .gtf into txdb objecttxdb = GenomicFeatures::makeTxDbFromGFF( "gencode.vM23.annotation.gtf", format="gtf", dataSource="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.annotation.gtf.gz", organism="Mus musculus", circ_seqs = "chrM" ) # extract transcript TSS GRanges_TSS = compEpiTools::TSS(txdb) I try overlapping both peak sets, explicitly selecting unique ranges for both GRanges objects: overlaps = findOverlapsOfPeaks(unique(peaks[1:6]), unique(GRanges_TSS)) And I get an error: Error in FUN(X[[i]], ...) : Inputs contains duplicated ranges. please recheck your inputs. Would it be possible to pinpoint the cause of this? Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or unsubscribe .

-- Yours sincerely, Jianhong Ou