honzee / RNAseqCNV

R package for large-scale CNV analysis from RNA-seq
MIT License
9 stars 8 forks source link

Error in mutate( ), snvOrd must be size 0 or 1, not 2. #21

Open anashank opened 1 year ago

anashank commented 1 year ago

Hi, when I run using a VCF generated from another pipeline, I get this long error message (See below). The format of the VCF I have is also shown below. It has some differences in the INFO, FORMAT fields compared to the GATK VCF. Can this VCF be used as input? What changes need to be made for the tool to work? Any suggestions would be helpful!

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample

chr1 1168629 . C T . weak_evidence DP=4;MQ=216.96;FractionInformativeReads=1.000;RatioSoftClips=0.00 GT:SQ:AD:AF:F1R2:F2R1:DP:SB:MB 0/1:1.03:3,1:0.250:2,0:1,1:4:3,0,0,1:2,1,1,0

Reading in vcf file..

Extracting depth..

Extracting reference allele and alternative allele depths..

Needed information from vcf extracted

Finished reading vcf

Error in mutate(): ℹ In argument: snvOrd = 1:n(). Caused by error: ! snvOrd must be size 0 or 1, not 2. Traceback:

  1. RNAseqCNV_wrapper(config = "/data/config", metadata = "/data/metadata", . snv_format = "vcf")
  2. calc_chrom_lvl(smpSNPdata.tmp)
  3. smpSNPdata.tmp %>% group_by(chr) %>% arrange(chr, desc(depth)) %>% . mutate(snvOrd = 1:n()) %>% filter(snvOrd <= 1000) %>% mutate(snvNum = n(), . peak_max = densityMaxY(maf), peak = findPeak(maf), peakCol = ifelse(between(peak, . 0.42, 0.58), "black", "red"), peakdist = find_peak_dist(maf)) %>% . ungroup() %>% mutate(chr = factor(chr, levels = c(1:22, "X")))
  4. mutate(., chr = factor(chr, levels = c(1:22, "X")))
  5. ungroup(.)
  6. mutate(., snvNum = n(), peak_max = densityMaxY(maf), peak = findPeak(maf), . peakCol = ifelse(between(peak, 0.42, 0.58), "black", "red"), . peakdist = find_peak_dist(maf))
  7. filter(., snvOrd <= 1000)
  8. mutate(., snvOrd = 1:n())
  9. mutate.data.frame(., snvOrd = 1:n())
  10. mutate_cols(.data, dplyr_quosures(...), by)
  11. withCallingHandlers(for (i in seq_along(dots)) { . poke_error_context(dots, i, mask = mask) . context_poke("column", old_current_column) . new_columns <- mutate_col(dots[[i]], data, mask, new_columns) . }, error = dplyr_error_handler(dots = dots, mask = mask, bullets = mutate_bullets, . error_call = error_call, error_class = "dplyr:::mutate_error"), . warning = dplyr_warning_handler(state = warnings_state, mask = mask, . error_call = error_call))
  12. mutate_col(dots[[i]], data, mask, new_columns)
  13. mask$eval_all_mutate(quo)
  14. eval()
  15. dplyr_internal_error("dplyr:::mutate_incompatible_size", list( . result_size = 2L, expected_size = 0L))
  16. abort(class = c(class, "dplyr:::internal_error"), dplyr_error_data = data)
  17. signal_abort(cnd, .file)
  18. signalCondition(cnd)
  19. (function (cnd) . { . local_error_context(dots, i = frame[[i_sym]], mask = mask) . if (inherits(cnd, "dplyr:::internal_error")) { . parent <- error_cnd(message = bullets(cnd)) . } . else { . parent <- cnd . } . message <- c(cnd_bullet_header(action), i = if (has_active_group_context(mask)) cnd_bullet_cur_group_label()) . abort(message, class = error_class, parent = parent, call = error_call) . })(structure(list(message = "", trace = structure(list(call = list( . IRkernel::main(), kernel$run(), handle_shell(), executor$execute(msg), . tryCatch(evaluate(request$content$code, envir = .GlobalEnv, . output_handler = oh, stop_on_error = 1L), interrupt = function(cond) { . log_debug("Interrupt during execution") . interrupted <<- TRUE . }, error = .self$handle_error), tryCatchList(expr, classes, . parentenv, handlers), tryCatchOne(tryCatchList(expr, . names[-nh], parentenv, handlers[-nh]), names[nh], parentenv, . handlers[[nh]]), doTryCatch(return(expr), name, parentenv, . handler), tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), . tryCatchOne(expr, names, parentenv, handlers[[1L]]), doTryCatch(return(expr), . name, parentenv, handler), evaluate(request$content$code, . envir = .GlobalEnv, output_handler = oh, stop_on_error = 1L), . evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos, . debug = debug, last = i == length(out), use_try = stop_on_error != . 2L, keep_warning = keep_warning, keep_message = keep_message, . output_handler = output_handler, include_timing = include_timing), . timing_fn(handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, . envir, enclos, user_handlers)), warning = wHandler, error = eHandler, . message = mHandler))), handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, . envir, enclos, user_handlers)), warning = wHandler, error = eHandler, . message = mHandler)), try(f, silent = TRUE), tryCatch(expr, . error = function(e) { . call <- conditionCall(e) . if (!is.null(call)) { . if (identical(call[[1L]], quote(doTryCatch))) . call <- sys.call(-4L) . dcall <- deparse(call, nlines = 1L) . prefix <- paste("Error in", dcall, ": ") . LONG <- 75L . sm <- strsplit(conditionMessage(e), "\n")[[1L]] . w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], . type = "w") . if (is.na(w)) . w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], . type = "b") . if (w > LONG) . prefix <- paste0(prefix, "\n ") . } . else prefix <- "Error : " . msg <- paste0(prefix, conditionMessage(e), "\n") . .Internal(seterrmessage(msg[1L])) . if (!silent && isTRUE(getOption("show.error.messages"))) { . cat(msg, file = outFile) . .Internal(printDeferredWarnings()) . } . invisible(structure(msg, class = "try-error", condition = e)) . }), tryCatchList(expr, classes, parentenv, handlers), . tryCatchOne(expr, names, parentenv, handlers[[1L]]), doTryCatch(return(expr), . name, parentenv, handler), withCallingHandlers(withVisible(eval_with_user_handlers(expr, . envir, enclos, user_handlers)), warning = wHandler, error = eHandler, . message = mHandler), withVisible(eval_with_user_handlers(expr, . envir, enclos, user_handlers)), eval_with_user_handlers(expr, . envir, enclos, user_handlers), eval(expr, envir, enclos), . eval(expr, envir, enclos), RNAseqCNV_wrapper(config = "/data/config", . metadata = "/data/metadata", snv_format = "vcf"), calc_chrom_lvl(smpSNPdata.tmp), . smpSNPdata.tmp %>% group_by(chr) %>% arrange(chr, desc(depth)) %>% . mutate(snvOrd = 1:n()) %>% filter(snvOrd <= 1000) %>% . mutate(snvNum = n(), peak_max = densityMaxY(maf), peak = findPeak(maf), . peakCol = ifelse(between(peak, 0.42, 0.58), "black", . "red"), peakdist = find_peak_dist(maf)) %>% ungroup() %>% . mutate(chr = factor(chr, levels = c(1:22, "X"))), mutate(., . chr = factor(chr, levels = c(1:22, "X"))), ungroup(.), . mutate(., snvNum = n(), peak_max = densityMaxY(maf), peak = findPeak(maf), . peakCol = ifelse(between(peak, 0.42, 0.58), "black", . "red"), peakdist = find_peak_dist(maf)), filter(., . snvOrd <= 1000), mutate(., snvOrd = 1:n()), mutate.data.frame(., . snvOrd = 1:n()), mutate_cols(.data, dplyr_quosures(...), . by), withCallingHandlers(for (i in seq_along(dots)) { . poke_error_context(dots, i, mask = mask) . context_poke("column", old_current_column) . new_columns <- mutate_col(dots[[i]], data, mask, new_columns) . }, error = dplyr_error_handler(dots = dots, mask = mask, . bullets = mutate_bullets, error_call = error_call, error_class = "dplyr:::mutate_error"), . warning = dplyr_warning_handler(state = warnings_state, . mask = mask, error_call = error_call)), mutate_col(dots[[i]], . data, mask, new_columns), mask$eval_all_mutate(quo), . eval(), dplyr_internal_error("dplyr:::mutate_incompatible_size", . <named list>), abort(class = c(class, "dplyr:::internal_error"), . dplyr_error_data = data)), parent = c(0L, 1L, 2L, 3L, . 4L, 5L, 6L, 7L, 6L, 9L, 10L, 4L, 12L, 13L, 13L, 15L, 16L, 17L, . 18L, 19L, 13L, 13L, 13L, 23L, 24L, 0L, 26L, 27L, 0L, 0L, 0L, . 0L, 0L, 0L, 34L, 35L, 35L, 37L, 38L, 0L, 40L), visible = c(TRUE, . TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, . TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, . TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, . TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE), namespace = c("IRkernel", . NA, "IRkernel", NA, "base", "base", "base", "base", "base", "base", . "base", "evaluate", "evaluate", "evaluate", "evaluate", "base", . "base", "base", "base", "base", "base", "base", "evaluate", "base", . "base", "RNAseqCNV", "RNAseqCNV", NA, "dplyr", "dplyr", "dplyr", . "dplyr", "dplyr", "dplyr", "dplyr", "base", "dplyr", NA, "dplyr", . "dplyr", "rlang"), scope = c("::", NA, "local", NA, "::", "local", . "local", "local", "local", "local", "local", "::", ":::", "local", . "local", "::", "::", "local", "local", "local", "::", "::", ":::", . "::", "::", "::", ":::", NA, "::", "::", "::", "::", "::", ":::", . ":::", "::", ":::", NA, "local", ":::", "::"), error_frame = c(FALSE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, . FALSE, FALSE, TRUE, FALSE)), row.names = c(NA, -41L), version = 2L, class = c("rlang_trace", . "rlib_trace", "tbl", "data.frame")), parent = NULL, dplyr_error_data = list( . result_size = 2L, expected_size = 0L), call = dplyr_internal_error("dplyr:::mutate_incompatible_size", . list(result_size = 2L, expected_size = 0L)), use_cli_format = TRUE), class = c("dplyr:::mutate_incompatible_size", . "dplyr:::internal_error", "rlang_error", "error", "condition" . )))
  20. abort(message, class = error_class, parent = parent, call = error_call)
  21. signal_abort(cnd, .file)
honzee commented 1 year ago

Hi,

I think the difference in the vcf input format causes this error. RNAseqCNV expects the output format of GATK. If you compare the accepted format in our README(https://github.com/honzee/RNAseqCNV#2141-vcf-) and your file, there are differences.

If possible, I would generate the vcf files with GATK, otherwise, you could reformat the files you already have.

Also, you could first try reformating the CHROM column as suggested here: https://github.com/honzee/RNAseqCNV/issues/22

Best, Jan

DzmitryGB commented 7 months ago

I've encountered the same issue when using "custom" SNV table. In my case the problem was due to SNV filtering on database produced empty table. Passing SNP_to_keep = FALSE argument to the wrapper bypasses the filtering step.

honzee commented 7 months ago

Good point @DzmitryGB , thank you for the comment.