.. This file was automatically generated by docs/create.py.
README
.. image:: https://badge.fury.io/py/pypgx.svg :target: https://badge.fury.io/py/pypgx
.. image:: https://readthedocs.org/projects/pypgx/badge/?version=latest :target: https://pypgx.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status
.. image:: https://anaconda.org/bioconda/pypgx/badges/version.svg :target: https://anaconda.org/bioconda/pypgx
.. image:: https://anaconda.org/bioconda/pypgx/badges/license.svg :target: https://github.com/sbslee/pypgx/blob/master/LICENSE
.. image:: https://anaconda.org/bioconda/pypgx/badges/downloads.svg :target: https://anaconda.org/bioconda/pypgx/files
The main purpose of the PyPGx package is to provide a unified platform for pharmacogenomics (PGx) research. PyPGx is and always will be completely free and open source.
The package is written in Python, and supports both command line interface
(CLI) and application programming interface (API) whose documentations are
available at the Read the Docs <https://pypgx.readthedocs.io/en/latest/>
_.
Quick links:
README <https://pypgx.readthedocs.io/en/latest/readme.html>
__Genes <https://pypgx.readthedocs.io/en/latest/genes.html>
__Glossary <https://pypgx.readthedocs.io/en/latest/glossary.html>
__Tutorials <https://pypgx.readthedocs.io/en/latest/tutorials.html>
__CLI <https://pypgx.readthedocs.io/en/latest/cli.html>
__API <https://pypgx.readthedocs.io/en/latest/api.html>
__SDK <https://pypgx.readthedocs.io/en/latest/sdk.html>
__FAQ <https://pypgx.readthedocs.io/en/latest/faq.html>
__Changelog <https://pypgx.readthedocs.io/en/latest/changelog.html>
__PyPGx can predict PGx genotypes (e.g. *4/*5
) and phenotypes (e.g.
Poor Metabolizer
) using various genomic data, including data from
next-generation sequencing (NGS), single nucleotide polymorphism (SNP) array,
and long-read sequencing. Importantly, for NGS data the package can detect
structural variation (SV) <https://pypgx.readthedocs.io/en/latest/ glossary.html#structural-variation-sv>
__ using a machine learning-based
approach. Finally, note that PyPGx is compatible with both of the Genome
Reference Consortium Human (GRCh) builds, GRCh37 (hg19) and GRCh38 (hg38).
There are currently 87 pharmacogenes in PyPGx:
.. list-table::
Your contributions (e.g. feature ideas, pull requests) are most welcome.
| Author: Seung-been "Steven" Lee | Email: sbstevenlee@gmail.com | License: MIT License
If you use PyPGx in a published analysis, please report the program version and cite the following article:
ClinPharmSeq: A targeted sequencing panel for clinical pharmacogenetics implementation <https://doi.org/10.1371/journal.pone.0272129>
__. PLOS ONE.In this article, PyPGx was used to call star alleles for genomic DNA
reference materials from the Centers for Disease Control and Prevention–based
Genetic Testing Reference Materials Coordination Program (GeT-RM) <https://pypgx.readthedocs.io/en/latest/glossary.html# genetic-testing-reference-materials-coordination-program-get-rm>
__, where it
showed almost 100% concordance with genotype results from previous works.
The development of PyPGx was heavily inspired by Stargazer <https:// stargazer.gs.washington.edu/stargazerweb/>
__, another star-allele calling
tool developed by Steven when he was in his PhD program at the University of
Washington. Therefore, please also cite the following articles:
Calling star alleles with Stargazer in 28 pharmacogenes with whole genome sequences <https://doi.org/10.1002/cpt.1552>
__. Clinical Pharmacology & Therapeutics.Stargazer: a software tool for calling star alleles from next-generation sequencing data using CYP2D6 as a model <https://doi.org/10.1038/s41436-018-0054-0>
__. Genetics in Medicine.Below is an incomplete list of publications which have used PyPGx:
Pharmacogenetic variation in Neanderthals and Denisovans and implications for human health and response to medications <https://doi.org/10.1101/2021.11.27.470071>
__. bioRxiv.Phased Haplotype Resolution of the SLC6A4 Promoter Using Long-Read Single Molecule Real-Time (SMRT) Sequencing <https://doi.org/10.3390/genes11111333>
__. Genes.If you find my work useful, please consider becoming a sponsor <https://github.com/sponsors/sbslee>
__.
Following packages are required to run PyPGx:
.. list-table:: :header-rows: 1
fuc
scikit-learn
openjdk
There are various ways you can install PyPGx. The recommended way is via
conda (Anaconda <https://www.anaconda.com/>
__):
.. code-block:: text
$ conda install -c bioconda pypgx
Above will automatically download and install all the dependencies as well.
Alternatively, you can use pip (PyPI <https://pypi.org/>
__) to install
PyPGx and all of its dependencies except openjdk
(i.e. Java JDK must be
installed separately):
.. code-block:: text
$ pip install pypgx
Finally, you can clone the GitHub repository and then install PyPGx locally:
.. code-block:: text
$ git clone https://github.com/sbslee/pypgx $ cd pypgx $ pip install .
The nice thing about this approach is that you will have access to
development versions that are not available in Anaconda or PyPI. For example,
you can access a development branch with the git checkout
command. When
you do this, please make sure your environment already has all the
dependencies installed.
.. note::
Beagle <https://faculty.washington.edu/browning/beagle/beagle.html>
__
is one of the default software tools used by PyPGx for haplotype phasing
SNVs and indels. The program is freely available and published under the
GNU General Public License <https://faculty.washington.edu/browning/ beagle/gpl_license>
__. Users do not need to download Beagle separately
because a copy of the software (beagle.22Jul22.46e.jar
) is already
included in PyPGx.
.. warning:: You're not done yet! Keep scrolling down to obtain the resource bundle for PyPGx, which is essential for running the package.
Starting with the 0.12.0 version, reference haplotype panel files and
structural variant classifier files in PyPGx are moved to the
pypgx-bundle
repository <https://github.com/sbslee/pypgx-bundle>
__
(only those files are moved; other files such as allele-table.csv
and
variant-table.csv
are intact). Therefore, the user must clone the
pypgx-bundle
repository with matching PyPGx version to their home
directory in order for PyPGx to correctly access the moved files (i.e. replace
x.x.x
with the version number of PyPGx you're using, such as 0.18.0
):
.. code-block:: text
$ cd ~ $ git clone --branch x.x.x --depth 1 https://github.com/sbslee/pypgx-bundle
This is undoubtedly annoying, but absolutely necessary for portability reasons because PyPGx has been growing exponentially in file size due to the increasing number of genes supported and their variation complexity, to the point where it now exceeds upload size limit for PyPI (100 Mb). After removal of those files, the size of PyPGx has reduced from >100 Mb to <1 Mb.
Starting with version 0.22.0, you can now specify a custom location for the
pypgx-bundle
directory instead of using the home directory. This can be
achieved by setting the bundle location using the PYPGX_BUNDLE
environment
variable:
.. code-block:: text
$ export PYPGX_BUNDLE=/path/to/pypgx-bundle
Many pharmacogenes are known to have structural variation (SV) <https://pypgx.readthedocs.io/en/latest/glossary.html#structural-variation- sv>
such as gene deletions, duplications, and hybrids. You can visit the
Genes <https://pypgx.readthedocs.io/en/latest/genes.html>
page to see the
list of genes with SV.
Some of the SV events can be quite challenging to detect accurately with NGS
data due to misalignment of sequence reads caused by sequence homology with
other gene family members (e.g. CYP2D6 and CYP2D7). PyPGx attempts to address
this issue by training a support vector machine (SVM) <https://scikit- learn.org/stable/modules/generated/sklearn.svm.SVC.html>
-based multiclass
classifier using the one-vs-rest strategy <https://scikit-learn.org/stable /modules/generated/sklearn.multiclass.OneVsRestClassifier.html>
for each
gene for each GRCh build. Each classifier is trained using copy number
profiles of real NGS samples as well as simulated ones, including those from
1KGP <https://pypgx.readthedocs.io/en/latest/glossary.html#genomes-project- 1kgp>
and GeT-RM <https://pypgx.readthedocs.io/en/latest/ glossary.html#genetic-testing-reference-materials-coordination-program-get-rm>
.
You can plot copy number profile and allele fraction profile with PyPGx to visually inspect SV calls. Below are CYP2D6 examples:
.. list-table:: :header-rows: 1 :widths: 10 30 60
PyPGx was recently applied to the entire high-coverage WGS dataset from 1KGP
(N=2,504). Click here <https://github.com/sbslee/1kgp-pgx-paper/tree/main/ sv-tables>
__ to see individual SV calls, and corresponding copy number
profiles and allele fraction profiles.
When working with PGx data, it's not uncommon to encounter a situation
where you are handling GRCh37 data in one project but GRCh38 in another. You
may be tempted to use tools like LiftOver
to convert GRCh37 to GRCh38, or
vice versa, but deep down you know it's going to be a mess (and please don't
do this). The good news is, PyPGx supports both of the builds!
In many PyPGx actions, you can simply indicate which genome build to use. For
example, for GRCh38 data you can use --assembly GRCh38
in CLI and
assembly='GRCh38'
in API. Note that GRCh37 will always be the
default. Below is an example of using the API:
.. code:: python3
>>> import pypgx
>>> pypgx.list_variants('CYP2D6', alleles=['*4'], assembly='GRCh37')
['22-42524947-C-T']
>>> pypgx.list_variants('CYP2D6', alleles=['*4'], assembly='GRCh38')
['22-42128945-C-T']
However, there is one important caveat to consider if your sequencing data is
GRCh38. That is, sequence reads must be aligned only to the main contigs
(i.e. chr1
, chr2
, ..., chrX
, chrY
), and not to the
alternative (ALT) contigs such as chr1_KI270762v1_alt
. This is because
the presence of ALT contigs reduces the sensitivity of variant calling
and many other analyses including SV detection. Therefore, if you have
sequencing data in GRCh38, make sure it's aligned to the main contigs only.
The only exception to above rule is the GSTT1 gene, which is located on
chr22
for GRCh37 but on chr22_KI270879v1_alt
for GRCh38. This gene is
known to have an extremely high rate of gene deletion polymorphism in the
population and thus requires SV analysis. Therefore, if you are interested in
genotyping this gene with GRCh38 data, then you must include that contig
when performing read alignment. To this end, you can easily filter your
reference FASTA file before read alignment so that it only contains the main
contigs plus the ALT contig. If you don't know how to do this, here's one way
using the fuc
program (which should have already been installed along
with PyPGx):
.. code-block:: text
$ cat contigs.list
chr1
chr2
...
chrX
chrY
chr22_KI270879v1_alt
$ fuc fa-filter in.fa --contigs contigs.list > out.fa
In order to efficiently store and transfer data, PyPGx uses the ZIP archive
file format (.zip
) which supports lossless data compression. Each archive
file created by PyPGx has a metadata file (metadata.txt
) and a data file
(e.g. data.tsv
, data.vcf
). A metadata file contains important
information about the data file within the same archive, which is expressed
as pairs of =
-separated keys and values (e.g. Assembly=GRCh37
):
.. list-table:: :widths: 20 40 40 :header-rows: 1
* - Metadata
- Description
- Examples
* - ``Assembly``
- Reference genome assembly.
- ``GRCh37``, ``GRCh38``
* - ``Control``
- Control gene.
- ``VDR``, ``chr1:10000-20000``
* - ``Gene``
- Target gene.
- ``CYP2D6``, ``GSTT1``
* - ``Platform``
- Genotyping platform.
- ``WGS``, ``Targeted``, ``Chip``, ``LongRead``
* - ``Program``
- Name of the phasing program.
- ``Beagle``, ``SHAPEIT``
* - ``Samples``
- Samples used for inter-sample normalization.
- ``NA07000,NA10854,NA11993``
* - ``SemanticType``
- Semantic type of the archive.
- ``CovFrame[CopyNumber]``, ``Model[CNV]``
Notably, all archive files have defined semantic types, which allows us to ensure that the data that is passed to a PyPGx command (CLI) or method (API) is meaningful for the operation that will be performed. Below is a list of currently defined semantic types:
CovFrame[CopyNumber]
Gene
, Assembly
, SemanticType
, Platform
, Control
, Samples
.CovFrame[DepthOfCoverage]
Assembly
, SemanticType
, Platform
.CovFrame[ReadDepth]
Gene
, Assembly
, SemanticType
, Platform
.Model[CNV]
Gene
, Assembly
, SemanticType
, Control
.SampleTable[Alleles]
Platform
, Gene
, Assembly
, SemanticType
, Program
.SampleTable[CNVCalls]
Gene
, Assembly
, SemanticType
, Control
.SampleTable[Genotypes]
Gene
, Assembly
, SemanticType
.SampleTable[Phenotypes]
Gene
, SemanticType
.SampleTable[Results]
Gene
, Assembly
, SemanticType
.SampleTable[Statistics]
Control
, Assembly
, SemanticType
, Platform
.VcfFrame[Consolidated]
Platform
, Gene
, Assembly
, SemanticType
, Program
.VcfFrame[Imported]
Platform
, Gene
, Assembly
, SemanticType
.VcfFrame[Phased]
Platform
, Gene
, Assembly
, SemanticType
, Program
.To demonstrate how easy it is to work with PyPGx archive files, below we will
show some examples. First, download an archive to play with, which has
SampleTable[Results]
as semantic type:
.. code-block:: text
$ wget https://raw.githubusercontent.com/sbslee/pypgx-data/main/getrm-wgs-tutorial/grch37-CYP2D6-results.zip
Let's print its metadata:
.. code-block:: text
$ pypgx print-metadata grch37-CYP2D6-results.zip
Gene=CYP2D6
Assembly=GRCh37
SemanticType=SampleTable[Results]
Now print its main data (but display first sample only):
.. code-block:: text
$ pypgx print-data grch37-CYP2D6-results.zip | head -n 2
Genotype Phenotype Haplotype1 Haplotype2 AlternativePhase VariantData CNV
HG00276_PyPGx *4/*5 Poor Metabolizer *4;*10;*74;*2; *10;*74;*2; ; *4:22-42524947-C-T:0.913;*10:22-42526694-G-A,22-42523943-A-G:1.0,1.0;*74:22-42525821-G-T:1.0;*2:default; DeletionHet
We can unzip it to extract files inside (note that tmpcty4c_cr
is the
original folder name):
.. code-block:: text
$ unzip grch37-CYP2D6-results.zip
Archive: grch37-CYP2D6-results.zip
inflating: tmpcty4c_cr/metadata.txt
inflating: tmpcty4c_cr/data.tsv
We can now directly interact with the files:
.. code-block:: text
$ cat tmpcty4c_cr/metadata.txt
Gene=CYP2D6
Assembly=GRCh37
SemanticType=SampleTable[Results]
$ head -n 2 tmpcty4c_cr/data.tsv
Genotype Phenotype Haplotype1 Haplotype2 AlternativePhase VariantData CNV
HG00276_PyPGx *4/*5 Poor Metabolizer *4;*10;*74;*2; *10;*74;*2; ; *4:22-42524947-C-T:0.913;*10:22-42526694-G-A,22-42523943-A-G:1.0,1.0;*74:22-42525821-G-T:1.0;*2:default; DeletionHet
We can easily create a new archive:
.. code-block:: text
$ zip -r grch37-CYP2D6-results-new.zip tmpcty4c_cr
adding: tmpcty4c_cr/ (stored 0%)
adding: tmpcty4c_cr/metadata.txt (stored 0%)
adding: tmpcty4c_cr/data.tsv (deflated 84%)
$ pypgx print-metadata grch37-CYP2D6-results-new.zip
Gene=CYP2D6
Assembly=GRCh37
SemanticType=SampleTable[Results]
Many genes in PyPGx have a genotype-phenotype table available from the Clinical Pharmacogenetics Implementation Consortium (CPIC) or the Pharmacogenomics Knowledge Base (PharmGKB). PyPGx uses these tables to perform phenotype prediction with one of the two methods:
Please visit the Genes <https://pypgx.readthedocs.io/en/latest/ genes.html>
__ page to see the list of genes with a genotype-phenotype
table and each of their prediction method.
To perform phenotype prediction with the API, you can use the
pypgx.predict_phenotype
method:
.. code:: python3
>>> import pypgx
>>> pypgx.predict_phenotype('CYP2D6', '*4', '*5') # Both alleles have no function
'Poor Metabolizer'
>>> pypgx.predict_phenotype('CYP2D6', '*5', '*4') # The order of alleles does not matter
'Poor Metabolizer'
>>> pypgx.predict_phenotype('CYP2D6', '*1', '*22') # *22 has uncertain function
'Indeterminate'
>>> pypgx.predict_phenotype('CYP2D6', '*1', '*1x2') # Gene duplication
'Ultrarapid Metabolizer'
To perform phenotype prediction with the CLI, you can use the
call-phenotypes
command. It takes a SampleTable[Genotypes]
file as
input and outputs a SampleTable[Phenotypes]
file:
.. code-block:: text
$ pypgx call-phenotypes genotypes.zip phenotypes.zip
PyPGx currently provides three pipelines for performing PGx genotype analysis
of single gene for one or multiple samples: NGS pipeline, chip pipeline, and
long-read pipeline. In additional to genotyping, each pipeline will perform
phenotype prediction based on genotype results. All pipelines are compatible
with both GRCh37 and GRCh38 (e.g. for GRCh38 use --assembly GRCh38
in CLI
and assembly='GRCh38'
in API).
.. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/flowchart-ngs-pipeline.png
Implemented as pypgx run-ngs-pipeline
in CLI and
pypgx.pipeline.run_ngs_pipeline
in API, this pipeline is designed for
processing short-read data (e.g. Illumina). Users must specify whether the
input data is from whole genome sequencing (WGS) or targeted sequencing
(custom targeted panel sequencing or whole exome sequencing).
This pipeline supports SV detection based on copy number analysis for genes
that are known to have SV. Therefore, if the target gene is associated with
SV (e.g. CYP2D6) it's strongly recommended to provide a
CovFrame[DepthOfCoverage]
file and a SampleTable[Statistics]
file in
addtion to a VCF file containing SNVs/indels. If the target gene is not
associated with SV (e.g. CYP3A5) providing a VCF file alone is enough. You can
visit the Genes <https://pypgx.readthedocs.io/en/latest/genes.html>
page
to see the full list of genes with SV. For details on SV detection algorithm,
please see the Structural variation detection <https://pypgx.readthedocs.io/ en/latest/readme.html#structural-variation-detection>
section.
When creating a VCF file (containing SNVs/indels) from BAM files, users have
a choice to either use the pypgx create-input-vcf
command (strongly
recommended) or a variant caller of their choice (e.g. GATK4
HaplotypeCaller). See the Variant caller choice <https://pypgx.readthedocs. io/en/latest/faq.html#variant-caller-choice>
__ section for detailed
discussion on when to use either option.
Check out the GeT-RM WGS tutorial <https://pypgx.readthedocs.io/en/latest/ tutorials.html#get-rm-wgs-tutorial>
__ to see this pipeline in action.
.. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/flowchart-chip-pipeline.png
Implemented as pypgx run-chip-pipeline
in CLI and
pypgx.pipeline.run_chip_pipeline
in API, this pipeline is designed for
DNA chip data (e.g. Global Screening Array from Illumina). It's recommended
to perform variant imputation on the input VCF prior to feeding it to the
pipeline using a large reference haplotype panel (e.g. TOPMed Imputation Server <https://imputation.biodatacatalyst.nhlbi.nih.gov/>
__).
Alternatively, it's possible to perform variant imputation with the 1000
Genomes Project (1KGP) data as reference within PyPGx using --impute
in
CLI and impute=True
in API.
The pipeline currently does not support SV detection. Please post a GitHub issue if you want to contribute your development skills and/or data for devising an SV detection algorithm.
Check out the Coriell Affy tutorial <https://pypgx.readthedocs.io/en/latest/ tutorials.html#coriell-affy-tutorial>
__ to see this pipeline in action.
.. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/flowchart-long-read-pipeline.png
Implemented as pypgx run-long-read-pipeline
in CLI and
pypgx.pipeline.run_long_read_pipeline
in API, this pipeline is designed
for long-read data (e.g. Pacific Biosciences and Oxford Nanopore
Technologies). The input VCF must be phased using a read-backed haplotype
phasing tool such as WhatsHap <https://github.com/whatshap/whatshap>
__.
The pipeline currently does not support SV detection. Please post a GitHub issue if you want to contribute your development skills and/or data for devising an SV detection algorithm.
PyPGx outputs per-sample genotype results in a table, which is stored in an
archive file with the semantic type SampleTable[Results]
. Below, we will
use the CYP2D6 gene with GRCh37 as an example to illustrate how to interpret
genotype results from PyPGx.
.. list-table:: :header-rows: 1
This list explains each of the columns in the example results.
pypgx.api.genotype
module (e.g. CYP2D6Genotyper <https://pypgx.readthedocs.io/en/latest/api.html#pypgx.api.genotype.CYP2D6Genotyper>
__).22-42523943-A-G
, 22-42524947-C-T
, and 22-42526694-G-A
, then it will get assigned *4;*10;
because the haplotype pattern can fit both *4 (22-42524947-C-T
) and *10 (22-42523943-A-G
and 22-42526694-G-A
). Note that *4 comes first before *10 because it has higher priority for reporting purposes (see the pypgx.sort_alleles
method <https://pypgx.readthedocs.io/en/latest/api.html#pypgx.api.core.sort_alleles>
__ for detailed implementation).22-42526694-G-A
for Haplotype1 and 22-42523805-C-T
for Haplotype2. Even though the two variants are in trans orientation, PyPGx will also consider alternative phase in case the two variants are actually in cis orientation, resulting in *69;
as AlternativePhase because *69 is defined by 22-42526694-G-A
and 22-42523805-C-T
.Structural variation detection <https://pypgx.readthedocs.io/en/latest/readme.html#structural-variation-detection>
__ section for more details.For detailed documentations on the CLI and API, please refer to the
Read the Docs <https://pypgx.readthedocs.io/en/latest/>
_.
For getting help on the CLI:
.. code-block:: text
$ pypgx -h
usage: pypgx [-h] [-v] COMMAND ...
positional arguments: COMMAND call-genotypes Call genotypes for target gene. call-phenotypes Call phenotypes for target gene. combine-results Combine various results for target gene. compare-genotypes Calculate concordance between two genotype results. compute-control-statistics Compute summary statistics for control gene from BAM files. compute-copy-number Compute copy number from read depth for target gene. compute-target-depth Compute read depth for target gene from BAM files. create-consolidated-vcf Create a consolidated VCF file. create-input-vcf Call SNVs/indels from BAM files for all target genes. create-regions-bed Create a BED file which contains all regions used by PyPGx. estimate-phase-beagle Estimate haplotype phase of observed variants with the Beagle program. filter-samples Filter Archive file for specified samples. import-read-depth Import read depth data for target gene. import-variants Import SNV/indel data for target gene. plot-bam-copy-number Plot copy number profile from CovFrame[CopyNumber]. plot-bam-read-depth Plot read depth profile with BAM data. plot-cn-af Plot both copy number profile and allele fraction profile in one figure. plot-vcf-allele-fraction Plot allele fraction profile with VCF data. plot-vcf-read-depth Plot read depth profile with VCF data. predict-alleles Predict candidate star alleles based on observed variants. predict-cnv Predict CNV from copy number data for target gene. prepare-depth-of-coverage Prepare a depth of coverage file for all target genes with SV from BAM files. print-data Print the main data of specified archive. print-metadata Print the metadata of specified archive. run-chip-pipeline Run genotyping pipeline for chip data. run-long-read-pipeline Run genotyping pipeline for long-read sequencing data. run-ngs-pipeline Run genotyping pipeline for NGS data. slice-bam Slice BAM file for all genes used by PyPGx. test-cnv-caller Test CNV caller for target gene. train-cnv-caller Train CNV caller for target gene.
options: -h, --help Show this help message and exit. -v, --version Show the version number and exit.
For getting help on a specific command (e.g. call-genotypes):
.. code-block:: text
$ pypgx call-genotypes -h
Below is the list of submodules available in the API:
For getting help on a specific submodule (e.g. utils
):
.. code:: python3
from pypgx.api import utils help(utils)
For getting help on a specific method (e.g. pypgx.predict_phenotype
):
.. code:: python3
import pypgx help(pypgx.predict_phenotype)
In Jupyter Notebook and Lab, you can see the documentation for a python
function by hitting SHIFT + TAB
. Hit it twice to expand the view.
We can print the metadata of an archive file:
.. code-block:: text
$ pypgx print-metadata grch37-depth-of-coverage.zip
Above will print:
.. code-block:: text
Assembly=GRCh37
SemanticType=CovFrame[DepthOfCoverage]
Platform=WGS
We can run the NGS pipeline for the CYP2D6 gene:
.. code-block:: text
$ pypgx run-ngs-pipeline \
CYP2D6 \
grch37-CYP2D6-pipeline \
--variants grch37-variants.vcf.gz \
--depth-of-coverage grch37-depth-of-coverage.zip \
--control-statistics grch37-control-statistics-VDR.zip
Above will create a number of archive files:
.. code-block:: text
Saved VcfFrame[Imported] to: grch37-CYP2D6-pipeline/imported-variants.zip
Saved VcfFrame[Phased] to: grch37-CYP2D6-pipeline/phased-variants.zip
Saved VcfFrame[Consolidated] to: grch37-CYP2D6-pipeline/consolidated-variants.zip
Saved SampleTable[Alleles] to: grch37-CYP2D6-pipeline/alleles.zip
Saved CovFrame[ReadDepth] to: grch37-CYP2D6-pipeline/read-depth.zip
Saved CovFrame[CopyNumber] to: grch37-CYP2D6-pipeline/copy-number.zip
Saved SampleTable[CNVCalls] to: grch37-CYP2D6-pipeline/cnv-calls.zip
Saved SampleTable[Genotypes] to: grch37-CYP2D6-pipeline/genotypes.zip
Saved SampleTable[Phenotypes] to: grch37-CYP2D6-pipeline/phenotypes.zip
Saved SampleTable[Results] to: grch37-CYP2D6-pipeline/results.zip
We can obtain allele function for the CYP2D6 gene:
.. code:: python3
>>> import pypgx
>>> pypgx.get_function('CYP2D6', '*1')
'Normal Function'
>>> pypgx.get_function('CYP2D6', '*4')
'No Function'
>>> pypgx.get_function('CYP2D6', '*22')
'Uncertain Function'
>>> pypgx.get_function('CYP2D6', '*140')
'Unknown Function'
We can predict phenotype for CYP2D6 based on two haplotype calls:
.. code:: python3
>>> import pypgx
>>> pypgx.predict_phenotype('CYP2D6', '*4', '*5') # Both alleles have no function
'Poor Metabolizer'
>>> pypgx.predict_phenotype('CYP2D6', '*5', '*4') # The order of alleles does not matter
'Poor Metabolizer'
>>> pypgx.predict_phenotype('CYP2D6', '*1', '*22') # *22 has uncertain function
'Indeterminate'
>>> pypgx.predict_phenotype('CYP2D6', '*1', '*1x2') # Gene duplication
'Ultrarapid Metabolizer'
We can also obtain recommendation (e.g. CPIC) for certain drug-phenotype combination:
.. code:: python3
>>> import pypgx
>>> # Codeine, an opiate and prodrug of morphine, is metabolized by CYP2D6
>>> pypgx.get_recommendation('codeine', 'CYP2D6', 'Normal Metabolizer')
'Use codeine label recommended age- or weight-specific dosing.'
>>> pypgx.get_recommendation('codeine', 'CYP2D6', 'Ultrarapid Metabolizer')
'Avoid codeine use because of potential for serious toxicity. If opioid use is warranted, consider a non-tramadol opioid.'
>>> pypgx.get_recommendation('codeine', 'CYP2D6', 'Poor Metabolizer')
'Avoid codeine use because of possibility of diminished analgesia. If opioid use is warranted, consider a non-tramadol opioid.'
>>> pypgx.get_recommendation('codeine', 'CYP2D6', 'Indeterminate')
'None'