lgragert / srtr-impute-pubsaf2306

Extract HLA typing from SRTR and format data for 9-locus high resolution HLA imputation
1 stars 0 forks source link

Compute HLA-DR molecule amino acid mismatch considering DRB1/3/4/5 gene copies #3

Open alyspayn opened 10 months ago

alyspayn commented 10 months ago

Modifies how aa_mm_biopython_runmatch_genie_9loc.py is run to compute DR amino acid mismatches differently

Need to add a new function to aa_matching_msf_genie.py

There is an existing function called count_AA_Mismatches_Allele() takes an input a pair of HLA alleles from the donor and pair of alleles from the recipient for a single locus (e.g. HLA-A or HLA-DQB1) and a amino acid position and counts how many unique AA residues are in the donor but not the recip. If the donor allele is homozygous, the mismatch count is 1.

Make new version of count_AA_Mismatches_Allele() called count_AA_Mismatches_DR(), taking as input 4 donor alleles and 4 recip alleles 0-4 mismatches per DR position, treating DRB1,3,4,5 as the same gene with up to 4 copies. So a donor could have DRB1*15:01, DRB1*03:01, DRB5*01:01 and DRB3*12:01 and you would count how many unique AA residues are in the donor but not the recip.

Need different Python environments for different steps aa_mm_biopython_runmatch.py requires pyARD which requires older version of Pandas construct_outcomes_vars.py requires Pandas 2.0

lgragert commented 7 months ago

DRB1 and DRB3/4/5 are different HLA loci but the proteins are very similar. DRB3/4/5 has copy number variation where an individual has at most 2 copies of a DRB3/4/5 gene.