lgragert / srtr-impute-pubsaf2306

Extract HLA typing from SRTR and format data for 9-locus high resolution HLA imputation
1 stars 0 forks source link

Compute allele mismatch categories also at the antigen recognition domain (ARD) level #12

Closed lgragert closed 9 months ago

lgragert commented 10 months ago

The full allele level mismatch variables have limited utility because of limitations of the NMDP typing data / haplotype frequencies.

Modify srtr_hla_antigen_mm.py to also compute ARD-level allele mismatch.

Roll up alleles to ARD level using pyARD (lgx redux type), then compare the strings. https://github.com/nmdp-bioinformatics/py-ard

lgragert commented 10 months ago

Top priority - we should make a version of the output files that exclude the two-field allele mismatch columns and only provides ARD-level allele mismatch, so as not to confuse Keith and Ryan.

lgragert commented 10 months ago

This won't be as complicated as allele-level TRS, because we're choosing one pair per multiple imputation replicate.

alyspayn commented 9 months ago

Created columns that begin with ARD_* to distinguish it from two-field level typing.

Typing columns called: ARD_REC_*_1,2 and ARD_DON_*_1,2 Allele MM columns called ARD_*_ALLELE_MM

(where *=locus)