immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
312 stars 65 forks source link

How does Immunarch define a clone? (AIRR and 10x formats) #278

Open kira-neller opened 2 years ago

kira-neller commented 2 years ago

❓ Questions and Help

We have a set of listed tutorials available on the website.

Hello, I am a Bioinformatician with iReceptor (https://gateway.ireceptor.org) and we are working on integrating Immunarch into our Analysis platform for analyzing data in the AIRR Data Commons (ADC). Our goal is to provide users with an "overview report" for each repertoire. We have a basic implementation working, but are uncertain as to how Immunarch defines a clone - in general and specifically in mapping AIRR format. As well, how does this definition change for single-cell data (e.g. 10x)?

This is the documentation I was able to find on this topic (https://immunarch.com/articles/v2_data.html#immunarch-data-format): “Clones” - count or number of barcodes (events, UMIs) or reads.

The above definition is a bit unclear so we are having trouble interpreting results. For example, when we compare top clones identified via the Immunarch top function vs. our in-house Statistics App, the results are different in terms of the CDR3 sequences identified and their counts. In our Statistics App, we are counting the top CDR3s in a repertoire (of rearrangements). Does Immunarch define clones as identical CDR3 sequence only, or does the definition incorporate other information (e.g. V/J gene call)?

Many thanks! Kira

Alexander230 commented 2 years ago

Hello, Kira! I'm Aleksandr Popov, a developer of Immunarch package. Thank you for using our software!

Immunarch reads clones as they are in the input file; for AIRR it's contents of duplicate_count column, and for 10x it's from umis column. If you have more questions, feel free to ask them!

Best regards, Aleksandr