immunogenomics / HLA-TAPAS

HLA-TAPAS pipeline for HLA association and fine-mapping studies
47 stars 20 forks source link

How to interpret HLA imputation output from TopMed Michigan web server using Multiethnic ref panel HLA #12

Closed samreenzafer closed 2 years ago

samreenzafer commented 2 years ago

Hi, The Michigan imputation web server mentions to refer to HLA-TAPAS https://imputationserver.readthedocs.io/en/latest/pipeline/#hla-imputation-pipeline

I used Michigan imputation to impute HLA alleles for >4,000 samples using the multi-ethnic reference panel. I noticed that sometimes they impute more than two alleles for a sample at a given locus. Example.

sample.id | SNP | MAF | Rsq 1017203 | HLA_A33:01:01:01 | 0.01409 | 0.80812 1017203 | HLA_A02:01:01:01 | 0.19351 | 0.91651 1017203 | HLA_A*25:01:01:01 | 0.01453 | 0.82306

and sample.id SNP MAF Rsq. Genotype 1014715 HLA_A66:02 0.0027 0.86 TA 1014715 HLA_A02:01:01:01 0.19 0.91. TT. - Does TT mean that both allleles at the locus are 02:01:01:01 and 02:01:01:01?

and sample.id | SNP | MAF | Rsq 1001658 | HLA_C06:02:01:01 | 0.08554 | 0.98787 1001658 | HLA_C07:01:01:01 | 0.12607 | 0.9789 1001658 | HLA_C07:02:01:01 | 0.10883 | 0.98047 1001658 | HLA_C07:04:01:01 | 0.01232 | 0.95232 1001658 | HLA_C07:96:01 | 0.00002 | 0.0781 1001658 | HLA_C08:01:01:01 | 0.00395 | 0.88016 1001658 | HLA_C08:03:01 | 0.00045 | 0.49729 1001658 | HLA_C08:04:01 | 0.00445 | 0.68034 1001658 | HLA_C14:03 | 0.00256 | 0.96677 1001658 | HLA_C04:01:01:01 | 0.13301 | 0.86631 1001658 | HLA_C*07:04:01:01 | 0.01232 | 0.95232

What does this type of result actually mean?

yluo86 commented 2 years ago

Hi @samreenzafer,

Thank you for using our imputation panel. Just so that I understand these outputs - what are the imputed genotypes for each allele? In the imputed vcf file, T = Present, A = Absent. So if, say, ind 1001658 has HLA_C06:02:01:01 has genotype "AA", then it means the individual does not have a copy of HLA_C06:02:01:01. As you suggested, if it is a TT, then it means the individual has the same classical HLA allele.

Sometimes, when we are unsure about which genotype is correct, we will split the reads between more than two classical alleles. In which case, you might want to check their "genotype probability" field for the estimated posterior probabilities for genotypes AA, AT and TT.

Hope this helps you understanding the imputed vcf.