sbslee / pypgx

A Python package for pharmacogenomics (PGx) research
https://pypgx.readthedocs.io
MIT License
66 stars 13 forks source link

AlleleNotFoundError: ('NAT2', '*12') #138

Closed toddknutson closed 3 months ago

toddknutson commented 3 months ago

Hi,

I have been running the pypgx run-ngs-pipeline in a loop for many genes, including NAT2. I have processed the same samples multiple times (for testing purposes) and I have not had a problem with NAT2. Today, I am getting the error below. Have you seen this before? I'm not really sure how to troubleshoot the issue?

I also noticed that the nomenclature for NAT2 has changed (the Reference allele 12 has been reassigned to 1). Could this have any possible impact? I still see *12 in the allele-table.csv within pypgx. https://nat.mbg.duth.gr/Human_NAT2_alleles.htm https://www.pharmgkb.org/gene/PA18

Thanks a lot for your helpful tool!!

Traceback (most recent call last):
  File "/home/group/share/conda/envs/clia_pgx-2024.07.19.modified/bin/pypgx", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/group/share/conda/envs/clia_pgx-2024.07.19.modified/lib/python3.12/site-packages/pypgx/__main__.py", line 33, in main
    commands[args.command].main(args)
  File "/home/group/share/conda/envs/clia_pgx-2024.07.19.modified/lib/python3.12/site-packages/pypgx/cli/run_ngs_pipeline.py", line 159, in main
    pipeline.run_ngs_pipeline(
  File "/home/group/share/conda/envs/clia_pgx-2024.07.19.modified/lib/python3.12/site-packages/pypgx/api/pipeline.py", line 247, in run_ngs_pipeline
    alleles = utils.predict_alleles(consolidated_variants)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/group/share/conda/envs/clia_pgx-2024.07.19.modified/lib/python3.12/site-packages/pypgx/api/utils.py", line 1179, in predict_alleles
    candidates = one_haplotype(observed)
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/group/share/conda/envs/clia_pgx-2024.07.19.modified/lib/python3.12/site-packages/pypgx/api/utils.py", line 1146, in one_haplotype
    candidates = core.sort_alleles(candidates, by='priority', gene=gene, assembly=assembly)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/group/share/conda/envs/clia_pgx-2024.07.19.modified/lib/python3.12/site-packages/pypgx/api/core.py", line 1598, in sort_alleles
    return sorted(alleles, key=funcs[by])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/group/share/conda/envs/clia_pgx-2024.07.19.modified/lib/python3.12/site-packages/pypgx/api/core.py", line 1565, in func1
    function = get_function(gene, allele)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/group/share/conda/envs/clia_pgx-2024.07.19.modified/lib/python3.12/site-packages/pypgx/api/core.py", line 483, in get_function
    raise sdk.utils.AlleleNotFoundError(gene, allele)
pypgx.sdk.utils.AlleleNotFoundError: ('NAT2', '*12')
ERROR: pypgx run-ngs-pipeline "${gene}" "${out_dir}" --variants "${vcf}" --platform "Targeted" --depth-of-coverage "depth_of_coverage.zip" --assembly "${assembly}" --control-statistics "control-statistics.zip"

FYI, below is a list of the consolidated variants:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA17115 NA17078 NA17240 NA17019 NA02016 NA17247
8   18399145    .   C   T   .   .   Phased  GT:AD:DP:AF 0|0:0,0:.:. 0|0:0,0:.:. 1|1:0,1:.:0.000,1.000   0|0:0,0:.:. 0|0:0,0:.:. 0|0:0,0:.:.
8   18399149    .   T   C   .   .   Phased  GT:AD:DP:AF 1|1:0,0:.:. 1|1:0,0:.:. 1|1:0,1:.:0.000,1.000   1|1:0,0:.:. 1|1:0,0:.:. 1|1:0,0:.:.
8   18399489    .   T   C   .   .   .   GT:AD:DP:AF:PE  0|0:1,0:.:1.000,0.000:0,0,0,0   0|0:1,0:.:1.000,0.000:0,0,0,0   .|.:0,0:.:.:0,0,0,0 .|.:0,0:.:.:0,0,0,0 0|0:2,0:.:1.000,0.000:0,0,0,0   1|1:0,1:.:0.000,1.000:0,0,0,0
8   18399513    .   A   G   .   .   Phased  GT:AD:DP:AF 1|1:0,0:.:. 1|1:0,1:.:0.000,1.000   0|0:1,0:.:1.000,0.000   0|0:0,0:.:. 1|1:0,1:.:0.000,1.000   0|0:1,0:.:1.000,0.000
8   18399749    .   C   G   .   .   Phased  GT:AD:DP:AF 0|1:1,1:.:0.500,0.500   0|0:1,0:.:1.000,0.000   0|0:7,0:.:1.000,0.000   0|0:3,0:.:1.000,0.000   0|0:2,0:.:1.000,0.000   0|0:4,0:.:1.000,0.000
8   18399770    .   C   T   .   .   Phased  GT:AD:DP:AF 1|1:0,1:.:0.000,1.000   1|1:0,2:.:0.000,1.000   0|0:6,0:.:1.000,0.000   0|0:2,0:.:1.000,0.000   1|1:0,2:.:0.000,1.000   0|0:4,0:.:1.000,0.000
8   18400285    .   C   T   .   .   Phased  GT:AD:DP:AF 1|1:0,91:.:0.000,1.000  0|0:130,0:.:1.000,0.000 0|0:125,0:.:1.000,0.000 0|0:105,0:.:1.000,0.000 0|0:76,0:.:1.000,0.000  1|1:0,106:.:0.000,1.000
8   18400344    .   T   C   .   .   Phased  GT:AD:DP:AF 0|0:85,0:.:1.000,0.000  1|1:0,103:.:0.000,1.000 0|0:124,0:.:1.000,0.000 0|0:113,0:.:1.000,0.000 1|1:0,81:.:0.000,1.000  0|0:110,0:.:1.000,0.000
8   18400484    .   C   T   .   .   Phased  GT:AD:DP:AF 0|0:112,0:.:1.000,0.000 1|1:0,107:.:0.000,1.000 0|0:109,0:.:1.000,0.000 0|0:103,0:.:1.000,0.000 1|1:0,98:.:0.000,1.000  0|0:108,0:.:1.000,0.000
8   18400593    .   G   A   .   .   Phased  GT:AD:DP:AF 1|0:63,57:.:0.525,0.475 0|0:110,1:.:0.991,0.009 0|0:116,0:.:1.000,0.000 0|0:88,0:.:1.000,0.000  0|0:107,0:.:1.000,0.000 1|1:0,84:.:0.000,1.000
8   18400806    .   G   A   .   .   Phased  GT:AD:DP:AF 1|1:0,103:.:0.000,1.000 0|0:103,0:.:1.000,0.000 1|1:0,85:.:0.000,1.000  1|1:0,96:.:0.000,1.000  0|0:95,0:.:1.000,0.000  1|1:1,90:.:0.011,0.989
8   18400841    .   G   A   .   .   Phased  GT:AD:DP:AF 1|0:56,47:.:0.544,0.456 0|0:95,0:.:1.000,0.000  0|0:84,0:.:1.000,0.000  0|0:97,0:.:1.000,0.000  0|0:88,0:.:1.000,0.000  0|0:97,0:.:1.000,0.000
8   18401024    .   T   C   .   .   Phased  GT:AD:DP:AF 0|0:16,0:.:1.000,0.000  0|1:4,7:.:0.364,0.636   0|0:9,0:.:1.000,0.000   0|0:20,0:.:1.000,0.000  1|0:4,4:.:0.500,0.500   0|0:9,0:.:1.000,0.000
8   18401193    .   A   G   .   .   .   GT:AD:DP:AF:PE  0|1:2,1:.:0.667,0.333:0,0,0,0   0|0:4,0:.:1.000,0.000:0,0,0,0   0|0:5,0:.:1.000,0.000:0,0,0,0   0|0:2,0:.:1.000,0.000:0,0,0,0   0|0:3,0:.:1.000,0.000:0,0,0,0   0|0:4,0:.:1.000,0.000:0,0,0,0
8   18401398    .   G   A   .   .   Phased  GT:AD:DP:AF 1|0:0,0:.:. 0|0:0,0:.:. 0|0:0,0:.:. 0|0:0,0:.:. 0|0:4,0:.:1.000,0.000   1|1:0,1:.:0.000,1.000
sbslee commented 3 months ago

Hi @toddknutson,

As for the AlleleNotFoundError issue, I'm not able to replicate it:

(fuc) sbslee@Seung-beens-MacBook-Air ~ % pypgx -v
pypgx 0.25.0
(fuc) sbslee@Seung-beens-MacBook-Air ~ % python
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:53:34) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pypgx
>>> pypgx.get_function('CYP2D6', '*1')
'Normal Function'
>>> pypgx.get_function('NAT2', '*12')
'Unknown Function'
>>> pypgx.get_function('NAT2', 'THIS_ALLELE_DOES_NOT_EXIST')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/fuc/lib/python3.10/site-packages/pypgx/api/core.py", line 483, in get_function
    raise sdk.utils.AlleleNotFoundError(gene, allele)
pypgx.sdk.utils.AlleleNotFoundError: ('NAT2', 'THIS_ALLELE_DOES_NOT_EXIST')

Which pypgx version are you currently using? As you can see above, I've used the latest version (v0.25.0) for the example.

And if I understood your question correctly, you have successfully run the PyPGx pipeline on the same samples before and the issue suddenly appeared, right? This tells me that you or someone in your team may have (accidentally) modified the allele-table.csv file, specifically NAT2*12. Check the last modified time for the file.

Thank you for informing me of the recent change in the NAT2 nomenclature. I will take a look at this and try to reflect this change to PyPGx ASAP. However, please note that this change is not related to the issue at hand, unless as I mentioned above the local copy of the allele-table.csv file was modified somehow.

Hope this helps.

toddknutson commented 3 months ago

Hi @sbslee

Thanks for your quick reply! I will investigate my allele-table.csv. Yes, it just started happening without any changes to my code, so I think your suggestion is probably correct. I am using version 0.25.0 (for both tool and bundle). I'll test later today. Since you cannot reproduce it, I'll close this issue and just add a comment if I realize my problem. Thanks again!

toddknutson commented 3 months ago

Yes, you were correct! The allele-table.csv was modified and that caused the issue. I restored the original table and it works as expected. 👏