sbslee / pypgx

A Python package for pharmacogenomics (PGx) research
https://pypgx.readthedocs.io
MIT License
65 stars 12 forks source link

Simplifying star allele calls to ignore Unknown Function #139

Closed toddknutson closed 2 months ago

toddknutson commented 3 months ago

We are interested in using pypgx to call star alleles, but (in one scenario) we do not want to report star alleles that have Unknown or Uncertain function. For example, if a sample was called CYP2D6 119, it would be labeled "Unknown Function." In this situation, the sample also contains the two SNPs that define 1, and we would like to simplify this call to be 1 instead of 119 (basically ignoring the other variants).

Is this kind of processing possible in pypgx? Or is there a way to modify the input data tables to restrict star allele calls to only alleles with a known function? I am curious to hear your perspective on this kind of strategy, or if you have other suggestions that would allow us to simplify our calls to report only well understood alleles/haplotypes?

This is not a software issue or problem. If you have time to consider the question, I would appreciate any feedback. Thanks again!

sbslee commented 2 months ago

Hi @toddknutson,

If I understood your situation correctly, you do not want PyPGx to use star alleles with unknown or uncertain function when returning the final diplotypes. This approach of using only a select subset of star alleles for genotyping is generally discouraged because it defeats the purpose of using the star allele nomenclature.

That being said, if you insist, I can think of a couple ways to achieve your goal:

  1. The simplest way would be to look at the results.zip file (SampleTable[Results]) and, instead of using the value in the Genotype column, you pick the star alleles that match your desired creteria from the Haplotype1 and Haplotype2 columns, which contain the candidate star allele(s) for each haplotype. For more details, please see the Results interpretation section of the Read the Docs.
  2. This option may require some programming but you could modify how PyPGx picks the final star alleles. Basically, you will need to modify the pypgx.api.core.sort_alleles method so that it behaves the way you want.

I strongly recommend that you report PyPGx outputs as is, but if you are going to explore the two options above and need help, I'd be happy to help.

toddknutson commented 2 months ago

Thank you for your detailed reply. This is very helpful! I'll take a close look at your suggestions and reopen this thread if I have any other questions. Thanks again!