openvax / mhctools

Python interface to running command-line and web-based MHC binding predictors
Apache License 2.0
89 stars 22 forks source link

Fix parsing of SLA allele names #38

Closed iskandr closed 8 years ago

iskandr commented 9 years ago

Currently getting warnings from parsing the supported alleles of NetMHC/NetMHCpan such as:

root: INFO: Skipping allele SLA-1-YDL01: Malformed MHC type 1
root: INFO: Skipping allele SLA-1-YTH: Malformed MHC type 1
root: INFO: Skipping allele SLA-2-YDL02: Malformed MHC type 2
root: INFO: Skipping allele SLA-3-CDY: Malformed MHC type 3
iskandr commented 9 years ago

From Nomenclature for factors of the SLA system, update 2008.:

SLA The SLA region and prefix for a SLA gene
SLA-1 A particular SLA locus
SLA-1*01 A group of SLA alleles(based on DNA sequence similarity)
SLA-1*0101 A specific confirmed SLA allele
SLA-1*0101L A low-expressio n SLA allele
SLA-1*0101N A null SLA allele
SLA-1*0101Q An SLA allele with profound sequence variations that may significantly modify the encoded protein expression
SLA-1*010101 An SLA allele that differs by synonymous nucleotide substitution
SLA-1*01010101 An SLA allele that contains synonymous nucleotide substitution outside the coding region that may modify the encoded protein expression

These all look more or less standard (aside from using a number instead of a letter to identify class I gene names). Where do names like "SLA-1-YTH" come from? .

iskandr commented 8 years ago

Moved to mhcnames: https://github.com/hammerlab/mhcnames/issues/3