psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
57 stars 34 forks source link

Naming of novel alleles with large numbers of SNPs? #287

Closed scharch closed 5 years ago

scharch commented 5 years ago

Is this a bug or did you decide that adding 7 SNPs to the allele name was too unwieldy?

 hv2-2601              1089 counts
             position   ratio       (one piece / two pieces)              0          1          2          3          4          5          6          7          8          9         10         11         12         13         14         15         16         17      
    7 snps
                  93     7.11             5.29 / 0.74                   0 / 268    0 / 234    1 / 104    0 / 81     1 / 49     1 / 35     2 / 30    15 / 43    10 / 34    12 / 30     5 / 23    12 / 28    14 / 28     5 / 14    10 / 21     4 / 11     3 / 11              
                 166     4.77             3.09 / 0.65                   0 / 268    0 / 234    0 / 104    0 / 81     0 / 49     3 / 35     0 / 30    11 / 43     7 / 34     9 / 30     3 / 23     9 / 28     5 / 28     4 / 14     7 / 21     3 / 11     3 / 11              
                 110     4.44             3.28 / 0.74                   0 / 268    2 / 234    0 / 104    0 / 81     1 / 49     0 / 35     0 / 30    13 / 43     8 / 34     9 / 30     3 / 23    10 / 28     5 / 28     5 / 14     7 / 21     2 / 11     3 / 11              
                 248     4.33             2.78 / 0.64                   0 / 268    0 / 234    0 / 104    4 / 80     1 / 49     1 / 35     0 / 30    12 / 43     8 / 34     8 / 30     4 / 23    10 / 28     5 / 28     4 / 14     7 / 21     2 / 11     3 / 11              
                 247     4.19             4.09 / 0.98                   0 / 267    0 / 234    1 / 104    0 / 81     0 / 48     0 / 35     1 / 30    13 / 43     8 / 34    10 / 30     3 / 23    11 / 28     6 / 28     7 / 14     7 / 21     2 / 11     4 / 11              
                  84     3.56             2.51 / 0.71                   0 / 268    0 / 234    1 / 104    2 / 81     0 / 49     1 / 35     1 / 30    12 / 43     7 / 34     9 / 30     3 / 23     9 / 28     4 / 28     4 / 14     7 / 21     2 / 11     3 / 11              
                  88     3.51             2.68 / 0.76                   0 / 268    2 / 234    1 / 104    0 / 81     1 / 49     1 / 35     1 / 30    13 / 43     7 / 34     8 / 30     4 / 23    10 / 28     4 / 28     4 / 14     8 / 21     2 / 11     4 / 11              
   snps       min ratio
    7          3.51     candidate 
        no equivalent gene for hv2-2601+32329       nearest is hv2-2601 (7 snps, 0 indels):
            CAGGTCACCTTGAAGGAGTCTGGTCCTGTGCTGGTGAAACCCACAGAGACCCTCACGCTGACCTGCACCGTCTCTGGGTTCTCACTCAGCAATGCTAGAATGGGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACACATTTTTTCGAATGACGAAAAATCCTACAGCACATCTCTGAAGAGCAGGCTCACCATCTCCAAGGACACCTCCAAAAGCCAGGTGGTCCTTACCATGACCA
            CAGGTCACCTTGAAGGAGTCTGGTCCTGTGCTGGTGAAACCCACAGAGACCCTCACGCTGACCTGCACCGTCTCTGGGTTCTCATTCACCAATCCTAGAATGGGTGTGAGTTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACACATTTTTTCGAGTGACGAAAAATCCTACAGCACATCTCTGAAGAGCAGGCTCACCATCTCCAAGGACACCTCCAAAAGCCAGGTGGTCCTTATGATGACCA
  new hv2-2601+32329 separated from hv2-2601 by 7 snps at:   84 (C --> T)  88 (G --> C)  93 (G --> C)  110 (C --> T)  166 (A --> G)  247 (C --> T)  248 (C --> G)
psathyrella commented 5 years ago

Not a bug! But open to arguments that it's a questionable design decision ;-). The cutoff is five SNPs (or any number of indels), after that it instead uses a hash() of the sequence. Naming ends up being quite seriously complicated, and has quite a bit of inescapable arbitrariness (around which gene should be the template, whether the index is zero- (partis) or one-based (others)), so I would recommend viewing any gene name as a frequently-useful shorthand that contains some amount of information about the gene's relationship to other genes, but the only really reliable identifier is the sequence.

scharch commented 5 years ago

Thanks