immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
297 stars 65 forks source link

repLoad error #106

Closed CUsdevoe closed 9 months ago

CUsdevoe commented 3 years ago

🐛 Bug

repLoad fails to load data

To Reproduce

Steps to reproduce the behavior:

  1. run repLoad with path to csv files

TRB <- repLoad(.path = "C:/Users/file path/TRB")

== Step 1/3: loading repertoire files... ==

Processing "C:/Users/file path/TRB" ... -- Parsing "C:/Users/file path/TRB/metadata.txt" -- metadata -- Parsing "C:/Users/file path/TRB/PatientA_CD4_TRB.csv" -- 10x (filt.contigs) Error: Assigned data toupper(df[[.nuc.seq]]) must be compatible with existing data. x Existing data has 457 rows. x Assigned data has 0 rows. i Only vectors of size 1 are recycled. Run rlang::last_error() to see where the error occurred. In addition: Warning messages: 1: The following named parsers don't match the column names: "barcode","is_cell","contig_id","high_confidence","length","chain","v_gene","d_gene","j_gene","c_gene","full_length","productive","cdr3","cdr3_nt","reads","umis","raw_clonotype_id","raw_consensus_id","Sample","Cell_type" 2: In .which_recomb_type(df[[.vgenes]]) : Can't determine the type of V(D)J recombination. No insertions will be presented in the resulting data table.

Expected behavior

I expected my filtered contigs csv files to be loaded into an immunarch object.

Additional context

I have multiplexed 10X single cell immune profiling data (processed by cellranger v 4.0.0) for T cells and I want to analyze the repertoire per patient and T cell type (CD4 vs CD8). I split the filtered contigs data into four different files by patient according to cell barcode and T cell type according to surface staining with TotalSeqC antibodies(Patient A CD4, Patient A CD8, Patient B CD4, Patient B CD8). When I load those csv files, repLoad will load the data but combine the CDR3 sequences from the alpha and beta chain; when I analyze by CDR3 length I get 3 peaks (cells with only one chain, cells with 1 alpha and 1 beta chain, and cells with multiple chains ex 2 alpha 1 beta) because it considers the length to be the length of CDR3a + CDR3b for each clonotype. I know single cell analysis is in the works for immunarch so as a work around I split the filtered contigs files further by TCR chain (Patient A CD4 TRA, Patient A CD4 TRB, .....) but I encounter an error with these csv files. I have checked the columns and it is in the same format as the csv files not split by TCR chain. I can load files of solely TRAs from patients from miXCR outputs; so I don't think the problem is with there only being one type of TCR chain present in the file.

I also tried downloading the development version that has single cell implementation in the works. For me, it still lumped the lengths of the CDR3s together for each clonotype for my data. When I loaded the sample data via data(scdata), I did not have that issue.

Manikgarg commented 3 years ago

I am having the same issue. Did you get a solution?

vadimnazarov commented 9 months ago

Closing this issue for now. It will be implemented in the next version of Immunarch.

More details on the next version of Immunarch are here: https://b-t.cr/t/immunarch-will-significantly-evolve-but-it-will-break-things-and-we-need-your-help/1123