shangguandong1996 / FindIT2

6 stars 1 forks source link

Cannot run calcRP_TFHit() ? #3

Closed kiddo18 closed 1 year ago

kiddo18 commented 1 year ago

Hi @shangguandong1996,

I already ran the mmAnno_geneScan function with my GRanges object but it says presumably that it couldn't convert to data.frame from my GRanges object. Do you know how I can fix this error?

fullRP_hit <- calcRP_TFHit(mmAnno = mmAnno_geneScan, Txdb = Txdb, decay_dist = 1000, report_fullInfo = T) calculating peakCenter to TSS using peak-gene pair... 2022-11-26 19:17:18 Error in data.frame(seqnames = as.factor(seqnames(x)), start = start(x), : duplicate row.names: 9462, 9463, 30400, 30401, 30402, 30403, 30404, 30405, 30406, 30407, 30408, 30409, 30410, 30411, 30412, 30413, 30414, 30415, 30416, 30417, 29117, 29111, 29116, 29107, 29105, 29110, 29109, 29113, 29118, 29108, 29114, 29112, 29115, 29119, 29106, 30420, 30421, 30422, 30423, 14381, 14383, 14385, 14403, 14413, 14411, 14388, 14406, 14405, 14393, 14414, 14401, 14397, 14390, 14400, 14392, 14412, 14399, 14394, 14387, 14386, 14391, 14408, 14409, 14407, 14402, 14389, 14404, 14395, 14398, 14410, 14382, 14384, 14396, 9309, 9315, 9310, 9311, 9316, 9312, 9314, 9313, 9307, 9308, 18968, 18954, 18979, 18951, 18965, 18958, 18969, 18948, 18949, 18981, 18955, 18961, 18953, 18982, 18963, 18976, 18971, 18977, 18975, 18950, 18972, 18980, 18957, 18974, 18973, 18978, 18970, 18964, 18952, 18966, 18967, 18946, 18960, 18962, 18956, 18959, 18947, 28876, 28872, 28871, 28875, 28873, 28870, 28869, 28874, 20152, 20157, 20150, 20156, 20160, 20148, 20153, 20151, 20149, 20159, 20168, 20161, 20

shangguandong1996 commented 1 year ago

Hi, thanks for your interest in FindIT2.

I am wondering whether you can give me a example data that can trigger error info. If you can not upload too much data in github, you can send me email: shangguandong@cemps.ac.cn or shangguandong1996@163.com

By the way, can you post your sessionInfo using sessionInfo() function ?

shangguandong1996 commented 1 year ago

Hi, I have received your data. I think it is because your GRange have duplicated names like this 图片

So when I run as.data.frame, it will report error.

I believe if your add one line code, it will run well

library(FindIT2)
library(TxDb.Hsapiens.BioMart.igis)

GR <- readRDS("GRanges_FindIT2_processed.rds")

# you can choose your own parameter if your species is Hm
# For example, upstream=100000 and downstream=100000
mmAnno_hm <- mm_geneScan(peak_GR = GR, Txdb = TxDb.Hsapiens.BioMart.igis)

# add this code will make the code run well
names(mmAnno_hm) <- NULL

fullRP_hit <- calcRP_TFHit(mmAnno = mmAnno_hm, 
                           Txdb = TxDb.Hsapiens.BioMart.igis, 
                           decay_dist = 10000, report_fullInfo = T)

By the way, I do not know how you produce your GRanges. I recommend you use the loadPeakFile in FindIT2, which actually use the rtracklayer::import to read peak file. For example,

peak_path <- system.file("extdata", "ChIP.bed.gz", package = "FindIT2")

# there is no names for my GRanges, so it will not report error.
> loadPeakFile(peak_path)
GRanges object with 4288 ranges and 2 metadata columns:
         seqnames            ranges strand |  feature_id     score
            <Rle>         <IRanges>  <Rle> | <character> <numeric>
     [1]     Chr5         6236-6508      * |  peak_14125        27
     [2]     Chr5         7627-8237      * |  peak_14126        51
     [3]     Chr5        9730-10211      * |  peak_14127        32
     [4]     Chr5       12693-12867      * |  peak_14128        22
     [5]     Chr5       13168-14770      * |  peak_14129       519
     ...      ...               ...    ... .         ...       ...
  [4284]     Chr5 26937822-26938526      * |  peak_18408       445
  [4285]     Chr5 26939152-26939267      * |  peak_18409        21
  [4286]     Chr5 26949581-26950335      * |  peak_18410       263
  [4287]     Chr5 26952230-26952558      * |  peak_18411        30
  [4288]     Chr5 26968877-26969091      * |  peak_18412        26
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
kiddo18 commented 1 year ago

Hi Guan Dong,

I got it to work following your suggestion but not sure what to make of this RP plot

The GRange object is generated from DiffBind from comparing inhibitor of TF vs. control.

image

Should I be using a different peak file (called by MACS2) as input to the loadPeakFile function? I have the narrowpeak file of each replicate (2 replicates) for each of the drug-treated sample and the control sample.

From the tutorial, it wasn’t clear to me which peak file/Grange object to use to find influential target of my TF by combining with RNA-seq differential gex data

Thank you so much! I opened an issue on the github repository sorry if it’s redundant. Just hoping other people can see your answer too