immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
311 stars 66 forks source link

Error by loading the 10X filtered_conting_annotations.csv data with prior resaving #183

Closed vkavaka closed 3 years ago

vkavaka commented 3 years ago

Hello and thank you very much for developing this wonderful package!

I was trying to load multiple TCR filtered_annotations.csv files with repLoad function of this package. Prior to it I was opening them in the loop with read.csv function, subsetting only the barcodes presented in my final dataset and then resaving them in the appropriate folder. Here is the loop I have been using for it:

object <- tcells tcr_folder <- 'X:/folder/' samples_general <- unique(object$sample) samples <- gsub(samples_general, pattern = '.*X', replacement = '') for(i in 1:length(samples)){ tcr <- read.csv(paste(tcrfolder, 'TR', samples[i], '/outs/', "filtered_contig_annotations.csv", sep="")) number <- filter(object@meta.data, sample == samples_general[i])$sample.effect[1] old_barcodes <- tcr$barcode tcr$barcode <- gsub(tcr$barcode, pattern = '-1', replacement = paste('-', number, sep = '')) tcr$contig_id <- gsub(tcr$contig_id, pattern = '-1', replacement = paste('-', number, sep = '')) tcr <- tcr[tcr$barcode %in% rownames(object@meta.data), ] write.csv(tcr, file = paste('immunarch/input3/', samples_general[i], '.csv', sep = ''), row.names = FALSE) }

After I was trying to load the files from the folder, I get the following error:

Warning message: "The following named parsers don't match the column names: "", "barcode", "is_cell", "contig_id", "high_confidence", "length", "chain", "v_gene", "d_gene", "j_gene", "c_gene", "full_length", "productive", "fwr1", "fwr1_nt", "cdr1", "cdr1_nt", "fwr2", "fwr2_nt", "cdr2", "cdr2_nt", "fwr3", "fwr3_nt", "cdr3", "cdr3_nt", "fwr4", "fwr4_nt", "reads", "umis", "raw_clonotype_id", "raw_consensus_id", "exact_subclonotype_id"" Warning message in .which_recomb_type(df[[.vgenes]]): "Can't determine the type of V(D)J recombination. No insertions will be presented in the resulting data table." Error: Assigned data toupper(df[[.nuc.seq]]) must be compatible with existing data. x Existing data has 531 rows. x Assigned data has 0 rows. i Only vectors of size 1 are recycled. Traceback:

  1. repLoad(tcr_folder)
  2. .process_batch(batches[[batch_i]], .format, .mode, .coding)
  3. .read_repertoire(.filepath, .format, .mode, .coding)
  4. parse_fun(.path, .mode)
  5. parse_repertoire(.filename, .mode = .mode, .nuc.seq = "cdr3_nt", . .aa.seq = NA, .count = "umis", .vgenes = "v_gene", .jgenes = "j_gene", . .dgenes = "d_gene", .vend = NA, .jstart = NA, .dstart = NA, . .dend = NA, .vd.insertions = NA, .dj.insertions = NA, .total.insertions = NA, . .skip = 0, .sep = ",", .add = c("chain", "barcode", "raw_clonotype_id", . "contig_id"))
  6. [[<-(*tmp*, .nuc.seq, value = character(0))
  7. [[<-.tbl_df(*tmp*, .nuc.seq, value = character(0))
  8. tbl_subassign(x, i, j, value, i_arg = i_arg, j_arg = j_arg, value_arg = value_arg)
  9. vectbl_recycle_rhs(value, fast_nrow(x), length(j), i_arg = NULL, . value_arg)
  10. withCallingHandlers(for (j in seq_along(value)) { . if (!is.null(value[[j]])) { . value[[j]] <- vec_recycle(value[[j]], nrow) . } . }, vctrs_error_recycle_incompatible_size = function(cnd) { . cnd_signal(error_assign_incompatible_size(nrow, value, j, . i_arg, value_arg)) . })
  11. vec_recycle(value[[j]], nrow)
  12. stop_recycle_incompatible_size(x_size = 0L, size = 531L, x_arg = "")
  13. stop_vctrs(x_size = x_size, y_size = size, x_arg = x_arg, class = c("vctrs_error_incompatible_size", . "vctrs_error_recycle_incompatible_size"))
  14. abort(message, class = c(class, "vctrs_error"), ...)
  15. signal_abort(cnd)
  16. signalCondition(cnd)
  17. (function (cnd) . { . cnd_signal(error_assign_incompatible_size(nrow, value, j, . i_arg, value_arg)) . })(structure(list(message = "", trace = structure(list(calls = list( . IRkernel::main(), kernel$run(), IRkernel:::handle_shell(), . executor$execute(msg), base::tryCatch(evaluate(request$content$code, . envir = .GlobalEnv, output_handler = oh, stop_on_error = 1L), . interrupt = function(cond) { . log_debug("Interrupt during execution") . interrupted <<- TRUE . }, error = .self$handle_error), base:::tryCatchList(expr, . classes, parentenv, handlers), base:::tryCatchOne(tryCatchList(expr, . names[-nh], parentenv, handlers[-nh]), names[nh], parentenv, . handlers[[nh]]), base:::doTryCatch(return(expr), name, . parentenv, handler), base:::tryCatchList(expr, names[-nh], . parentenv, handlers[-nh]), base:::tryCatchOne(expr, names, . parentenv, handlers[[1L]]), base:::doTryCatch(return(expr), . name, parentenv, handler), evaluate::evaluate(request$content$code, . envir = .GlobalEnv, output_handler = oh, stop_on_error = 1L), . evaluate:::evaluate_call(expr, parsed$src[[i]], envir = envir, . enclos = enclos, debug = debug, last = i == length(out), . use_try = stop_on_error != 2L, keep_warning = keep_warning, . keep_message = keep_message, output_handler = output_handler, . include_timing = include_timing), evaluate:::timing_fn(handle(ev <- withCallingHandlers(withVisible(eval(expr, . envir, enclos)), warning = wHandler, error = eHandler, . message = mHandler))), evaluate:::handle(ev <- withCallingHandlers(withVisible(eval(expr, . envir, enclos)), warning = wHandler, error = eHandler, . message = mHandler)), base::try(f, silent = TRUE), base::tryCatch(expr, . error = function(e) { . call <- conditionCall(e) . if (!is.null(call)) { . if (identical(call[[1L]], quote(doTryCatch))) . call <- sys.call(-4L) . dcall <- deparse(call)[1L] . prefix <- paste("Error in", dcall, ": ") . LONG <- 75L . sm <- strsplit(conditionMessage(e), "\n")[[1L]] . w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], . type = "w") . if (is.na(w)) . w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], . type = "b") . if (w > LONG) . prefix <- paste0(prefix, "\n ") . } . else prefix <- "Error : " . msg <- paste0(prefix, conditionMessage(e), "\n") . .Internal(seterrmessage(msg[1L])) . if (!silent && isTRUE(getOption("show.error.messages"))) { . cat(msg, file = outFile) . .Internal(printDeferredWarnings()) . } . invisible(structure(msg, class = "try-error", condition = e)) . }), base:::tryCatchList(expr, classes, parentenv, handlers), . base:::tryCatchOne(expr, names, parentenv, handlers[[1L]]), . base:::doTryCatch(return(expr), name, parentenv, handler), . base::withCallingHandlers(withVisible(eval(expr, envir, enclos)), . warning = wHandler, error = eHandler, message = mHandler), . base::withVisible(eval(expr, envir, enclos)), base::eval(expr, . envir, enclos), base::eval(expr, envir, enclos), immunarch::repLoad(tcr_folder), . immunarch:::.process_batch(batches[[batch_i]], .format, .mode, . .coding), immunarch:::.read_repertoire(.filepath, .format, . .mode, .coding), immunarch:::parse_fun(.path, .mode), . immunarch:::parse_repertoire(.filename, .mode = .mode, .nuc.seq = "cdr3_nt", . .aa.seq = NA, .count = "umis", .vgenes = "v_gene", .jgenes = "j_gene", . .dgenes = "d_gene", .vend = NA, .jstart = NA, .dstart = NA, . .dend = NA, .vd.insertions = NA, .dj.insertions = NA, . .total.insertions = NA, .skip = 0, .sep = ",", .add = c("chain", . "barcode", "raw_clonotype_id", "contig_id")), base::[[<-(*tmp*, . .nuc.seq, value = character(0)), tibble:::[[<-.tbl_df(*tmp*, . .nuc.seq, value = character(0)), tibble:::tbl_subassign(x, . i, j, value, i_arg = i_arg, j_arg = j_arg, value_arg = value_arg), . tibble:::vectbl_recycle_rhs(value, fast_nrow(x), length(j), . i_arg = NULL, value_arg), base::withCallingHandlers(for (j in seq_along(value)) { . if (!is.null(value[[j]])) { . value[[j]] <- vec_recycle(value[[j]], nrow) . } . }, vctrs_error_recycle_incompatible_size = function(cnd) { . cnd_signal(error_assign_incompatible_size(nrow, value, . j, i_arg, value_arg)) . }), vctrs::vec_recycle(value[[j]], nrow), vctrs:::stop_recycle_incompatible_size(x_size = 0L, . size = 531L, x_arg = ""), vctrs:::stop_vctrs(x_size = x_size, . y_size = size, x_arg = x_arg, class = c("vctrs_error_incompatible_size", . "vctrs_error_recycle_incompatible_size"))), parents = c(0L, . 1L, 2L, 3L, 4L, 5L, 6L, 7L, 6L, 9L, 10L, 4L, 12L, 13L, 13L, 15L, . 16L, 17L, 18L, 19L, 13L, 13L, 13L, 23L, 0L, 25L, 26L, 27L, 28L, . 29L, 29L, 31L, 32L, 33L, 33L, 0L, 36L), indices = 1:37), class = "rlang_trace", version = 1L), . parent = NULL, x_size = 0L, y_size = 531L, x_arg = ""), class = c("vctrs_error_incompatible_size", . "vctrs_error_recycle_incompatible_size", "vctrs_error", "rlang_error", . "error", "condition")))
  18. cnd_signal(error_assign_incompatible_size(nrow, value, j, i_arg, . value_arg))
  19. signal_abort(cnd)

It seems like the problem is in opening and resaving the filtered_contig_annotations files. Here is the message I get by opening one of the files in the list without prior resaving: (please notice the names of the columns written not in the quotation marks)

== Step 1/3: loading repertoire files... ==

Processing "" ...

-- [1/1] Parsing "X:/filtered_contig_annotations.csv" -- 10x (filt.contigs)

Warning message: "The following named parsers don't match the column names: barcode,is_cell,contig_id,high_confidence,length,chain,v_gene,d_gene,j_gene,c_gene,full_length,productive,fwr1,fwr1_nt,cdr1,cdr1_nt,fwr2,fwr2_nt,cdr2,cdr2_nt,fwr3,fwr3_nt,cdr3,cdr3_nt,fwr4,fwr4_nt,reads,umis,raw_clonotype_id,raw_consensus_id,exact_subclonotype_id"

== Step 2/3: checking metadata files and merging files... ==

Processing "" ...

-- Metadata file not found; creating a dummy metadata...

== Step 3/3: processing paired chain data... ==

Done!

vkavaka commented 3 years ago

The issue resolved by adding quote=FALSEparameter into the write.csv function