immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
296 stars 65 forks source link

Using 10x genomics data #256

Open scas224 opened 2 years ago

scas224 commented 2 years ago

Hi! I'm brand new to using immunarch and am preparing to use it to analyze some cellranger vdj data from 10x genomics. I was a little confused in the "Loading 10x Genomics Data" vignette, because originally it says "You should use the filtered contigs csv files because they contain barcode information.", but then in the load into immunarch step, it appears that you load all of the csv files from cellranger output, not just the filtered_contig file.

Would you recommend using all of the csv files or only one at a time?

When I followed the vignette and loaded all of the output csv files from some sample data I got this warning:

Warning messages: 1: The following named parsers don't match the column names: barcode,is_cell,contig_id,high_confidence,length,chain,v_gene,d_gene,j_gene,c_gene,full_length,productive,fwr1,fwr1_nt,cdr1,cdr1_nt,fwr2,fwr2_nt,cdr2,cdr2_nt,fwr3,fwr3_nt,cdr3,cdr3_nt,fwr4,fwr4_nt,reads,umis,raw_clonotype_id,raw_consensus_id,exact_subclonotype_id 2: The following named parsers don't match the column names: clonotype_id,consensus_id,length,chain,v_gene,d_gene,j_gene,c_gene,full_length,productive,fwr1,fwr1_nt,cdr1,cdr1_nt,fwr2,fwr2_nt,cdr2,cdr2_nt,fwr3,fwr3_nt,cdr3,cdr3_nt,fwr4,fwr4_nt,reads,umis,v_start,v_end,v_end_ref,j_start,j_start_ref,j_end,fwr1_start,fwr1_end,cdr1_start,cdr1_end,fwr2_start,fwr2_end,cdr2_start,cdr2_end,fwr3_start,fwr3_end,cdr3_start,cdr3_end,fwr4_start,fwr4_end 3: The following named parsers don't match the column names: barcode,is_cell,contig_id,high_confidence,length,chain,v_gene,d_gene,j_gene,c_gene,full_length,productive,fwr1,fwr1_nt,cdr1,cdr1_nt,fwr2,fwr2_nt,cdr2,cdr2_nt,fwr3,fwr3_nt,cdr3,cdr3_nt,fwr4,fwr4_nt,reads,umis,raw_clonotype_id,raw_consensus_id,exact_subclonotype_id

Is this because I loaded all of the csv files even though their structures are different?

Also, I was following your single-cell paired data vignette, and I'm a little confused on how to create the cluster specific datasets. Your sample data (scdata) already contains information about the cell clusters, but how would you use this function if your immunarch input data is the output of celranger multi? Is there a way to merge the filtered contigs annotations csv with a Seurat object containing the clusters made from cellranger count?

Also, what further analyses would you recommend for data from 10x genomics? There is a link for exploring the dataset on the loading tutorial, but it says "Page not found", and the other tutorials seem to be for data that is structured differently than the loaded 10x data.

Sorry for the seemingly simple questions, I am just brand new to this package and want to make sure I understand it properly.

MVolobueva commented 1 year ago

Hi, @scas224 !

Thank you for using our package! We know that our support of 10x genomics is not ideal. And now we are focused on improving our package in this area.

Also, what further analyses would you recommend for data from 10x genomics?

Thank you so much for drawing our attention to this! I suppose that rigth link is here. We will fix it in the near future:)

Аs for the rest of the questions, we will be very pleased and grateful if you consider providng an example of your 10x Genomics data. You can send the data directly to our tech support at support@immunomind.io. This will help to better understand your problem and make the Immunarch better:)

Best regards, Maria Samokhina