nmdp-bioinformatics / py-ard

HLA ARD Reduction in Python
https://py-ard.org/
GNU Lesser General Public License v3.0
17 stars 13 forks source link

MACs with "allele specific" codes expansion #274

Closed pbashyal-nmdp closed 12 months ago

pbashyal-nmdp commented 1 year ago

The MAC codes that have allele field expansion should only work with the same antigen field as that in the MAC.

DRB1*07:DFJR expands to DRB1*15..., DFJR for DRB1 should only be valid for DRB1*15:DFJR

$ pyard --expand-mac "DRB1*07:DFJR"
DRB1*15:01/DRB1*15:04/DRB1*15:07/DRB1*15:27/DRB1*15:30/DRB1*15:32/DRB1*15:33/DRB1*15:34/DRB1*15:35/DRB1*15:38/DRB1*15:45/DRB1*15:60/DRB1*15:63/DRB1*15:70/DRB1*15:71/DRB1*15:73/DRB1*15:75/DRB1*15:77/DRB1*15:78/DRB1*15:81/DRB1*15:82/DRB1*15:85/DRB1*15:92/DRB1*15:96/DRB1*15:97/DRB1*15:102/DRB1*15:104/DRB1*15:105/DRB1*15:109/DRB1*15:110/DRB1*15:116/DRB1*15:117/DRB1*15:118/DRB1*15:122

DRB1*07:DFJR should be invalid.

mmaiers-nmdp commented 1 year ago

Should this be invalid? Should this be py-ard's job? Yea, I guess so on both counts.

pbashyal-nmdp commented 1 year ago

This is the reference I've found that mentions how "allele specific" codes work.

https://bioinformatics.bethematchclinical.org/hla-resources/allele-codes/allele-code-lists

Allele combinations that contain an expression character (e.g., N = null, S = soluble, L = low expression), or which cross serologic families (e.g., DRB3*01:01/01:02/02:03/02:06) are called "allele specific" codes and can only be used with the locus and allele family that were used when it was initially created.

pbashyal-nmdp commented 1 year ago

Should this be invalid? Should this be py-ard's job? Yea, I guess so on both counts.

We currently only use the locus name when it's "allele specific" code. https://github.com/nmdp-bioinformatics/py-ard/blob/master/pyard/ard.py#L569-L576

pbashyal-nmdp commented 1 year ago

Expansion rules for allele specific MACs: based on documentation by @bmilius-nmdp

  1. The 1st field with the most allele designations in the request is the 1st field of the allele code designation
  2. If there is a tie in the number of alleles designations sharing the 1st field, the 1st field with the lowest numeric value is selected.