immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
309 stars 64 forks source link

Calculating diversity metrics for pre-computed clonotypes #141

Closed lucygarner closed 2 years ago

lucygarner commented 3 years ago

Hi,

I would like to calculate diversity metrics (e.g. the Gini coefficient) using pre-computed clonotypes, which I have calculated these using paired TCR alpha and TCR beta single-cell TCR-seq data. Is there a way to do this within immunarch? As an alternative, I have seen that you can calculate the Gini coefficient using the DescTools package, however I am not clear on what the input required for this is.

Best wishes, Lucy

Alexander230 commented 3 years ago

Hi, Lucy! My name is Aleksandr Popov, I am a developer of the Immunarch package. Thank you for using our software!

You can load pre-computed clonotypes to Immunarch and calculate Gini coefficient with the following commands:

library("immunarch")
immdata <- repLoad("/my/data/clonotypes.tsv")
gini <- repDiversity(.data=immdata$data, .method="gini")

Replace the path in repLoad command to your file with clonotypes. You can find the list of file formats supported by repLoad function here: https://immunarch.com/articles/v2_data.html#input-output-1

Here is the example how the data can look like after loading to Immunarch; Immunarch has a built-in example which can be loaded by command data(immdata):

> library("immunarch")
> data(immdata)
> immdata$data
$`A2-i129`                                                                                                                                                                                                        
# A tibble: 6,532 x 15                                                                                                                                                                                            
   Clones Proportion CDR3.nt    CDR3.aa V.name D.name J.name V.end D.start D.end                                                                                                                                  
    <dbl>      <dbl> <chr>      <chr>   <chr>  <chr>  <chr>  <int>   <int> <int>
 1    173    0.0204  TGCGCCAGC… CASSQE… TRBV4… TRBD1  TRBJ2…    16      18    26
 2    163    0.0192  TGCGCCAGC… CASSYR… TRBV4… TRBD1  TRBJ2…    11      13    18
 3     66    0.00776 TGTGCCACC… CATSTN… TRBV15 TRBD1  TRBJ2…    11      16    22
 4     54    0.00635 TGTGCCACC… CATSIG… TRBV15 TRBD2  TRBJ2…    11      19    25
 5     48    0.00565 TGTGCCAGC… CASSPW… TRBV27 TRBD1  TRBJ1…    11      16    23
 6     48    0.00565 TGCGCCAGC… CASQGD… TRBV4… TRBD1  TRBJ1…     8      13    19
 7     40    0.00471 TGCGCCAGC… CASSQD… TRBV4… TRBD1  TRBJ2…    16      21    26
 8     31    0.00365 TGTGCCAGC… CASSEE… TRBV2  TRBD1  TRBJ1…    15      17    20
 9     30    0.00353 TGCGCCAGC… CASSQP… TRBV4… TRBD1  TRBJ2…    14      23    28
10     28    0.00329 TGTGCCAGC… CASSWV… TRBV6… TRBD1  TRBJ2…    12      20    25
# … with 6,522 more rows, and 5 more variables: J.start <int>, VJ.ins <dbl>,
#   VD.ins <dbl>, DJ.ins <dbl>, Sequence <lgl>

$`A2-i131`
# A tibble: 6,553 x 15
   Clones Proportion CDR3.nt    CDR3.aa V.name D.name J.name V.end D.start D.end
    <dbl>      <dbl> <chr>      <chr>   <chr>  <chr>  <chr>  <int>   <int> <int>
 1    111    0.0131  TGCAGTGCT… CSASRG… TRBV2… TRBD1  TRBJ2…    11      12    17
 2     93    0.0109  TGTGCCAGC… CASSVA… TRBV9  TRBD1  TRBJ2…    15      21    23
 3     66    0.00776 TGTGCCAGC… CASSRM… TRBV13 TRBD1  TRBJ2…    11      18    24
 4     59    0.00694 TGTGCCAGC… CASSPT… TRBV6… TRBD2  TRBJ2…    10      14    19
 5     57    0.00671 TGCGCCAGC… CASSLD… TRBV5… TRBD2  TRBJ1…    15      17    20
 6     47    0.00553 TGTGCCAGC… CASRGL… TRBV6… TRBD2  TRBJ2…    10      11    16
 7     46    0.00541 TGCAGCGTT… CSVTGV… TRBV2… TRBD1  TRBJ2…     8       9    13
 8     30    0.00353 TGTGCCAGC… CASSYL… TRBV6… TRBD2  TRBJ1…    15      17    19
 9     29    0.00341 TGTGCCAGC… CASSLA… TRBV5… TRBD1  TRBJ1…    15      21    26
10     29    0.00341 TGTGCCAGC… CASSYI… TRBV6… TRBD1  TRBJ1…    14      17    20
# … with 6,543 more rows, and 5 more variables: J.start <int>, VJ.ins <dbl>,
#   VD.ins <dbl>, DJ.ins <dbl>, Sequence <lgl>

...

And this is the result of Gini coefficient calculation for the example data:

> gini <- repDiversity(.data=immdata$data, .method="gini")
> gini
             [,1]
A2-i129 0.2297097
A2-i131 0.2252784
A2-i133 0.2513861
A2-i132 0.2017009
A4-i191 0.3863010
A4-i192 0.3064599
MS1     0.3610387
MS2     0.1561629
MS3     0.2396675
MS4     0.1224806
MS5     0.3320779
MS6     0.1278508
attr(,"class")
[1] "immunr_gini" "matrix"      "array"

Best regards, Aleksandr

Alexander230 commented 2 years ago

Hi, Lucy! We are closing this issue due to inactivity. You are welcome to comment and reopen the issue if there are still unresolved questions.