slowkow / CENTIPEDE.tutorial

:bug: How to use CENTIPEDE to determine if a transcription factor is bound.
https://slowkow.github.io/CENTIPEDE.tutorial
25 stars 13 forks source link

Error in if (sum(idx) == 0) { : missing value where TRUE/FALSE needed #8

Open zhuangfjnu opened 6 years ago

zhuangfjnu commented 6 years ago

Hi, When I run the centipede_data , the result show : Error in if (sum(idx) == 0) { : missing value where TRUE/FALSE needed. Maybe it is some problem with my bam file?

Thanks Zhen

slowkow commented 6 years ago

Hi Zhen,

Thanks for opening an issue! If you could share a minimal example of code and subset of your data that reproduces this error, I can try to run it on my laptop and fix the problem.

slowkow commented 6 years ago

Please feel free to reopen this issue if you can provide a minimal example.

rdbcasillas commented 5 years ago

Hey slowkow!

I get the same error as Zhen. Here is the snippet of code and the corresponding error:


>cen_P0_rep1 <- centipede_data(bam_file = "call-bowtie2/shard-0/execution/P0_rep1_R1.merged.bam", fimo_file = "centipede-files/P0_rep1.fimo.txt.gz", pval= 1e-4, flank_size = 100)

Parsed with column specification:
cols(
  motif_id = col_character(),
  motif_alt_id = col_character(),
  sequence_name = col_character(),
  start = col_double(),
  stop = col_double(),
  strand = col_character(),
  score = col_double(),
  `p-value` = col_double(),
  `q-value` = col_logical(),
  matched_sequence = col_character()
)

|=========================================================================| 100%  824 MB

Error in if (sum(idx) == 0) { : missing value where TRUE/FALSE needed

The code ran for couple of hours before throwing this error. So seems like something happened near the end of execution. I get the same error even if I remove the parameters "flank size" and/or "pval".

Let me know if I can provide any more information or if you want me to open a new issue.

Thanks!

slowkow commented 5 years ago

Hi Vatsal, thanks for sharing your code and error. That helps.

If your session is still live, it would help if you run traceback() so we can see which line in which file caused the error.

The error might come from this code (line 163):

https://github.com/slowkow/CENTIPEDE.tutorial/blob/12811e78f035510c5d021bf5f4295be3fd935ffa/R/functions.R#L151-L165

If you can share data and code that I can run on laptop, then I might be able to help.

Otherwise, I would suggest that you try running the code from functions.R by hand -- line by line in the R console -- and try to find out what is going wrong.

Good luck! Let me know if you figure it out!

rdbcasillas commented 5 years ago

Thanks for responding so quickly Kamil!

I did run the traceback command. Here's the output:

> traceback()
4: FUN(X[[i]], ...)
3: lapply(seq(1, length(bam_list)), function(i) {
       region <- parse_region(names(bam_list)[i])
       region_len <- abs(region$end - region$start) + 1
       item <- bam_list[[i]]
       neg_shift <- item$qwidth * as.numeric(item$strand == "-")
       item$pos <- item$pos + neg_shift
       idx <- item$pos >= region$start & item$pos <= region$end
       if (sum(idx) == 0) {
           return(rep(0, 2 * region_len))
       }
       strand <- item$strand[idx]
       position <- item$pos[idx]
       is.neg <- as.numeric(strand == "-")
       j <- 1 + position - region$start + (region_len * is.neg)
       as.numeric(table(factor(j, levels = seq(1, 2 * region_len))))
   })
2: do.call(rbind, lapply(seq(1, length(bam_list)), function(i) {
       region <- parse_region(names(bam_list)[i])
       region_len <- abs(region$end - region$start) + 1
       item <- bam_list[[i]]
       neg_shift <- item$qwidth * as.numeric(item$strand == "-")
       item$pos <- item$pos + neg_shift
       idx <- item$pos >= region$start & item$pos <= region$end
       if (sum(idx) == 0) {
           return(rep(0, 2 * region_len))
       }
       strand <- item$strand[idx]
       position <- item$pos[idx]
       is.neg <- as.numeric(strand == "-")
       j <- 1 + position - region$start + (region_len * is.neg)
       as.numeric(table(factor(j, levels = seq(1, 2 * region_len))))
   }))
1: centipede_data(bam_file = "call-bowtie2/shard-0/execution/P0_rep1_R1.merged.bam", 
       fimo_file = "centipede-files/P0_rep1.fimo.txt.gz", pval = 1e-04, 
       flank_size = 100)

I doubt this is very helpful for you. Since the input files are massive, would you like a trimmed down version of those files? I worry that these smaller files may disallow you to duplicate the exact error.

Otherwise, I would suggest that you try running the code from functions.R by hand -- line by line in the R console -- and try to find out what is going wrong.

I still haven't tried this so will let you know if I find something interesting.

rdbcasillas commented 5 years ago

So it turns out there was an issue with the BAM file. When I used the "deduped" BAM file, the problem seems to disappear. May be it is important for CENTIPEDE that the BAM file is without duplicates? I am not sure but at the moment that has fixed the issue.

shokohirosue commented 4 years ago

Hi all,

I have the same problem. I tried removing duplicates but my data is a paired-end I have not been able to succeed in doing so. Has anyone looked into this issue? How did you remove duplicates?

Thank you.

Shoko