sbslee / pypgx

A Python package for pharmacogenomics (PGx) research
https://pypgx.readthedocs.io
MIT License
66 stars 13 forks source link

Haplotypes called for IFNL3 #97

Closed sumudu-rangika closed 1 year ago

sumudu-rangika commented 1 year ago

Hi Steven,

I used 4x coverage WGS data (GRCh37 aligned) for ~2000 African ancestry individuals to identify variant alleles in the IFNL3 gene. I phased and imputed the variant file using GLIMPSE v1 and used this VCF as input to PyPGx v0.19.0.

This is an issue I came across in IFNL3 haplotypes. The only variant allele called in the data is rs12980275 (GRCh37 19:39731783 A-G). However, this variant is not defined in the PharmGKB allele definitions table.

The variant rs12979860 (GRCh37 19:39738787 C-T), which is defined in the PharmGKB gene tables is also present in my VCF whcih I used as the input to PyPGx. Although this variant is included in the PyPGx allele table, I am wondering why this has not been called?

Attached are screenshots displaying these two positions in the VCF. Appreciate your thoughts on this.

Screenshot1 Screenshot2

Thank you Best Regards Sumudu

sbslee commented 1 year ago

@sumudu-rangika,

Thanks for reporting the issue! I think I need more information to answer your question. Could you please share the following files (CYP3A5 is just an example; please send the IFNL3 files) so I can reproduce the problem and investigate further?

Saved VcfFrame[Imported] to: grch37-CYP3A5-pipeline/imported-variants.zip
Saved VcfFrame[Phased] to: grch37-CYP3A5-pipeline/phased-variants.zip
Saved VcfFrame[Consolidated] to: grch37-CYP3A5-pipeline/consolidated-variants.zip
Saved SampleTable[Alleles] to: grch37-CYP3A5-pipeline/alleles.zip
Saved SampleTable[Genotypes] to: grch37-CYP3A5-pipeline/genotypes.zip
Saved SampleTable[Phenotypes] to: grch37-CYP3A5-pipeline/phenotypes.zip
Saved SampleTable[Results] to: grch37-CYP3A5-pipeline/results.zip

Also, please show me the exact command line and the stdout messages.

sumudu-rangika commented 1 year ago

Hi Steven,

Thanks for the reply. I will e-mail you these result files for IFNL3.

Best Sumudu

sbslee commented 1 year ago

Hi @sumudu-rangika,

Thank you for sending the files. This issue is indeed a bug in PyPGx, and I fixed it in the 0.21.0-dev branch. You can try it yourself:

$ git clone https://github.com/sbslee/pypgx
$ cd pypgx
$ git checkout 0.21.0-dev
$ pip install .
pypgx run-ngs-pipeline \
IFNL3 \
grch37-IFNL3-pipeline \
--variants imputed_phased_chr19.vcf.gz \
--do-not-plot-copy-number \
--do-not-plot-allele-fraction \
--assembly GRCh37

Saved VcfFrame[Consolidated] to: grch37-IFNL3-pipeline/imported-variants.zip
Saved SampleTable[Alleles] to: grch37-IFNL3-pipeline/alleles.zip
Saved SampleTable[Genotypes] to: grch37-IFNL3-pipeline/genotypes.zip
Saved SampleTable[Phenotypes] to: grch37-IFNL3-pipeline/phenotypes.zip
Saved SampleTable[Results] to: grch37-IFNL3-pipeline/results.zip

Basically, the problem was caused because the variant of interest, rs12979860 (19-39738787-C-T), was located outside of IFNL3's target region (chr19:39731245-39738646). Therefore, when the input VCF was imported by PyPGx, this variant was not included in the downstream analysis. In fact, you may not have realized it, but there was another variant that was also included in your VCF but not considered by PyPGx: rs8099917 (19-39743165-T-G).

I fixed the issue by simply expanding the target region for IFNL3: 19:39731245-39738646 to 19:39731245-39744165 for GRCh37 and 19:39240552-39248006 to 19:39240552-39253525 for GRCh38. This way, all the IFNL3 variants currently utilized by PyPGx will be included. I checked other genes to make sure and am glad to report to you that IFNL3 was the only one with this issue.

Thank you again for reporting the issue. I hope this helps!

Best regards, Steven