Closed amirshams84 closed 6 years ago
# motif_id motif_alt_id sequence_name start stop strand score p-value q-value matched_sequence
MA0019.1 Ddit3::Cebpa chr6 79234475 79234486 + 11.0444 8.01e-05 CATTGCAATGCC
MA0019.1 Ddit3::Cebpa chr8 89901788 89901799 - 11.4778 5.97e-05 TGATGCAATAGG
MA0019.1 Ddit3::Cebpa chr6 80106972 80106983 + 12.3111 3.26e-05 GGCTGCAATCCT
MA0019.1 Ddit3::Cebpa chr8 38938952 38938963 + 10.8778 8.66e-05 gagtgcaatggt
MA0019.1 Ddit3::Cebpa chr8 38938960 38938971 + 12.3111 3.26e-05 tggtgcaatctc
MA0019.1 Ddit3::Cebpa chr3 107522958 107522969 - 11.4444 6.11e-05 ACATGCAACCCG
MA0019.1 Ddit3::Cebpa chr7 87934132 87934143 - 10.8 9.18e-05 CGGTGCACTCGC
MA0019.1 Ddit3::Cebpa chr4 129093221 129093232 + 10.7 9.84e-05 CGGTGCAATCTG
MA0019.1 Ddit3::Cebpa chr3 120742836 120742847 - 14.3333 5.04e-06 AAATGCAATGCC
Thanks for opening this issue! I appreciate it. I probably have a bug in my code.
You might be able to solve this problem by running each line from the centipede_data()
function one line at a time.
I would be very grateful if you can explain what went wrong! If you're willing to make a pull request, I would be happy to review and accept it.
Here's the code you should try running:
https://github.com/slowkow/CENTIPEDE.tutorial/blob/master/R/functions.R#L86-#L132
# Read the FIMO output file.
sites <- read_fimo(fimo_file = fimo_file, pvalue = pvalue, ...)
# Upstream flank, the center of PWM match, and downstream flank.
sites$start <- sites$start - flank_size
sites$stop <- sites$stop + flank_size
# Index the BAM file if necessary.
bam_index_file <- sprintf("%s.bai", bam_file)
if (!file.exists(bam_index_file)) {
message("Indexing the BAM file... this may take several minutes.")
indexBam(bam_file, overwrite = FALSE)
}
# Extract reads that overlap the PWM sites.
bam_list <- Rsamtools::scanBam(
file = bam_file,
param = Rsamtools::ScanBamParam(
which = GRanges(
seqnames = sites$sequence.name,
ranges = IRanges(
start = sites$start,
end = sites$stop
)
),
what = c("strand", "pos", "qwidth")
)
)
if (length(bam_list) == 0) {
stop(sprintf("No reads fall in sites from '%s'\n", fimo_file))
}
# Convert the list of "chr:start-end" regions to a dataframe.
regions <- lapply(names(bam_list), parse_region)
regions <- data.frame(
sequence.name = unlist(sapply(regions, function(x) x["chrom"])),
start = as.numeric(unlist(sapply(regions, function(x) x["start"]))),
stop = as.numeric(unlist(sapply(regions, function(x) x["end"])))
)
regions$index <- 1:nrow(regions)
# Grab score, p.value, and q.value, but drop regions with no reads.
regions <- merge(regions, sites)
# Sort the regions by coordinate.
regions <- regions[with(regions, order(sequence.name, start, stop)),]
bam_list <- bam_list[regions$index]
Hi Kamil thanks for a quick reply from what I understand, it seems FIMO in MEME-SUITES4.12 added another column to the result which may cause an issue in the downstream analysis I want to read my fimo file with read_fimo and the same error appeared
I really appreciate any help regards Amir
On Fri, Mar 16, 2018 at 2:31 PM, Kamil Slowikowski <notifications@github.com
wrote:
Thanks for opening this issue! I appreciate it. I probably have a bug in my code.
You might be able to solve this problem by running each line from the centipede_data() function one line at a time.
I would be very grateful if you can explain what went wrong! If you're willing to make a pull request, I would be happy to review and accept it.
Here's the code you should try running:
https://github.com/slowkow/CENTIPEDE.tutorial/blob/ master/R/functions.R#L86-#L132
Read the FIMO output file.
sites <- read_fimo(fimo_file = fimo_file, pvalue = pvalue, ...)
Upstream flank, the center of PWM match, and downstream flank.
sites$start <- sites$start - flank_size sites$stop <- sites$stop + flank_size
Index the BAM file if necessary.
bam_index_file <- sprintf("%s.bai", bam_file) if (!file.exists(bam_index_file)) { message("Indexing the BAM file... this may take several minutes.") indexBam(bam_file, overwrite = FALSE) }
Extract reads that overlap the PWM sites.
bam_list <- Rsamtools::scanBam( file = bam_file, param = Rsamtools::ScanBamParam( which = GRanges( seqnames = sites$sequence.name, ranges = IRanges( start = sites$start, end = sites$stop ) ), what = c("strand", "pos", "qwidth") ) )
if (length(bam_list) == 0) { stop(sprintf("No reads fall in sites from '%s'\n", fimo_file)) }
Convert the list of "chr:start-end" regions to a dataframe.
regions <- lapply(names(bam_list), parse_region) regions <- data.frame( sequence.name = unlist(sapply(regions, function(x) x["chrom"])), start = as.numeric(unlist(sapply(regions, function(x) x["start"]))), stop = as.numeric(unlist(sapply(regions, function(x) x["end"]))) ) regions$index <- 1:nrow(regions)
Grab score, p.value, and q.value, but drop regions with no reads.
regions <- merge(regions, sites)
Sort the regions by coordinate.
regions <- regions[with(regions, order(sequence.name, start, stop)),] bam_list <- bam_list[regions$index]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/slowkow/CENTIPEDE.tutorial/issues/10#issuecomment-373805277, or mute the thread https://github.com/notifications/unsubscribe-auth/AMtB2NjvVCkrpW89bqtgdZOT7yHJBOnzks5tfATugaJpZM4SuOp5 .
Could I ask you to please try reinstalling with
devtools::install_github("slowkow/CENTIPEDE.tutorial")
and try again?
Hi Kamil
runx1 <- centipede_data(bam_file = "../jfarber-20160119-P1/spsingh-20171019-D2/VA2.7+CCR6+CCR2+/hg38/align/rep1/AB2955-L1_R1.trim.PE2SE.bam", fimo_file = "../Centepede/runx1.txt.gz", pvalue = 1e-4,flank_size = 100)
Parsed with column specification:
cols(
#pattern name
= col_character(),
sequence name
= col_character(),
start = col_integer(),
stop = col_integer(),
strand = col_character(),
score = col_double(),
p-value
= col_double(),
q-value
= col_character(),
matched sequence
= col_character()
)
Error in if (sum(idx) == 0) { : missing value where TRUE/FALSE needed
Could I ask you to try running the code line by line? This might be a quick way to find the source of the error.
Try starting at line 92: https://github.com/slowkow/CENTIPEDE.tutorial/blob/master/R/functions.R#L92
Feel free to re-open this issue if you need more help.
I did try your tutorial with the meme-suite version4.12, I received this error while I wanted to create centipede data