immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
303 stars 65 forks source link

RepLoad 10X data #374

Open TheRaspberryFox opened 1 year ago

TheRaspberryFox commented 1 year ago

Hello,

Great package. However, I am running into an issue when loading in my files with RepLoad. Specifically, my files are filtered_contig_annotation.csv files from 10X. However, these are the only headers. I am running into issues as there appears to be a requirement for fwr1,fwr1_nt,cdr1,cdr1_nt,fwr2,fwr2_nt,cdr2,cdr2_nt,fwr3,fwr3_nt column headers. I only have column names for the following:

"barcode" "is_cell" "contig_id" "high_confidence" "length" "chain" "v_gene" "d_gene" "j_gene" "c_gene" "full_length" "productive" "cdr3" "cdr3_nt" "reads" "umis" "raw_clonotype_id" "raw_consensus_id"

Is there a way to read in the data with the data that I have?

Thanks

margaretc-ho commented 6 months ago

I have the same question! I am following these instructions https://immunarch.com/articles/web_only/load_10x.html#prepare-10x-data and trying to read in the data downloaded from this 10X genomics dataset (seems like a very standard dataset) and the filtered_contig_annotations has the following columns: barcode is_cell contig_id high_confidence length chain v_gene d_gene j_gene c_gene full_length productive cdr3 cdr3_nt reads umis raw_clonotype_id raw_consensus_id donor origin and not the following columns that RepLoad is looking for. Namely I get the error:

> file_path = "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data"
> immdata_10x <- repLoad(file_path)

== Step 1/3: loading repertoire files... ==

Processing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data" ...
  -- [1/5] Parsing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data/sc5p_v2_hs_B_flu_aggregated_5gex_b_vdj_b_clonotypes.csv" -- unsupported format, skipping
  -- [2/5] Parsing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data/sc5p_v2_hs_B_flu_aggregated_5gex_b_vdj_b_consensus_annotations.csv" -- 10x (consensus)
  -- [3/5] Parsing "/Users/homc/Library/CloudStorage/TEST_BCellFlu_data/sc5p_v2_hs_B_flu_aggregated_5gex_b_vdj_b_filtered_contig_annotations.csv" -- 10x (filt.contigs)
Error in `df[, vec_names]`:
! Can't subset columns that don't exist.
✖ Columns `cdr1_nt`, `cdr1`, `cdr2_nt`, `cdr2`, `fwr1_nt`, etc. don't exist.
Backtrace:
 1. immunarch::repLoad(file_path)
 2. immunarch (local) .process_batch(batches[[batch_i]], .mode, .coding)
 3. immunarch (local) .read_repertoire(.filepath, .mode, .coding, ...)
 4. immunarch (local) parse_fun(.path, .mode, ...)
 5. immunarch:::parse_repertoire(...)
 9. tibble:::`[.tbl_df`(df, , vec_names)

Any suggestions? @vadimnazarov

margaretc-ho commented 6 months ago

It seems like this is a common issue that many others are having when trying to load in 10X Genomics data https://github.com/immunomind/immunarch/issues/363 https://github.com/immunomind/immunarch/issues/358 We are all seeming to get this same error because of the column names So far, seems no solution

conormcguinness9016 commented 3 weeks ago

Just wanting to add that this is still an issue - has anyone found any solutions?