k-eaton / PGroups

An HLA alleles aggregator.
0 stars 1 forks source link

Need to account for deleted alleles #11

Open sjmack opened 6 years ago

sjmack commented 6 years ago

Since the original tables were compiled, the C*17:01:01:01 allele was deleted, because it was found to be identical to the C*17:01:01:02 allele.

See the Deleted_alleles.txt table or the hla_nom.txt table. The first is easier for a human to read, but harder for a machine to parse, and the second is longer, and harder for a human to read, but easier for a machine to parse.

For now, C*17:01:01:01 is the only CWD allele for which this has happened, but there should be some sort of check to see if a CWD allele has been renamed (meaning that a CWD accession number was depreciated and a new accession number has been assigned), as opposed to simply having had the name extended. The new accession number/allele name should be used in the new tables.

In some cases, alleles have been deleted because they have turned out to be sequence artifacts, and no other allele (accession number) replaces them. I don't think that this will be an issue for the CWD tables.

sjmack commented 6 years ago

Looks like I was wrong about this issue only pertaining to one allele. There are three alleles in the CWD 2.0.0 catalog that have been renamed as of August, 2018.

Locus  OrigAlleleName     NewAlleleName   Event               origAccession newAccession
B*     47:01:01:01        47:01:01:03     Sequence error      HLA00332      HLA14088
C*     17:01:01:01        17:01:01:02     Sequence error      HLA00481      HLA04311
DPA1*  02:02:01           02:07:01:01     Sequence identical  HLA00508      HLA15619

Here, the original allele name is in the catalog, and the new allele name is the "true" allele.