sbslee / pypgx

A Python package for pharmacogenomics (PGx) research
https://pypgx.readthedocs.io
MIT License
66 stars 13 forks source link

PYPGX workflow with premade VCF #128

Closed evyamor closed 7 months ago

evyamor commented 7 months ago

Hi, Can I use PYPGX with a VCF created outside of it, like one generated by DRAGEN? Do I still need 'depth-of-coverage' and 'control-statistics' files? Specifically, I'm curious about using the pipeline directly on a non-PYPGX VCF. Are additional files necessary, or can I work with just the VCF? Also, are there specific data requirements for the VCF itself ( its structure ) ? Thanks for your help, Evyatar

sbslee commented 7 months ago

Hi @evyamor,

You can surely use your own VCF, like ones created by DRAGEN, to run PyPGx.

You only need 'depth-of-coverage' and 'control-statistics' files if you want to perform structural variation detection for complex genes such as CYP2D6.

I will list below some resources that might be useful to you:

Let me know if you have further questions.

evyamor commented 7 months ago

Thank you for the fast response!

I will test PYPGX on VCFs created via DRAGEN ( and separately also on BAM files I create ) , I wanted to know if there is a need of GVCF, or VCF containing only all SNVs\indels.

Also, as stated here: 'users had been instructed to create input VCF file from BAM files on their own using a variant caller of their choice (e.g. GATK4, bcftools, DRAGEN, DeepVariant). This can raise several potential problems such as decreased reproducibility of PyPGx results and users providing incorrectly formatted VCF to PyPGx.' I wanted to know if there is a specific structure requirement in PyPGx for the VCF for decreasing these potential problems. ( Here I refer mostly to the fields under the FORMAT column; ( DB, VDB, SGB...... )

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA18519_PyPGx HG01190_PyPGx NA12006_PyPGx NA18484_PyPGx NA07055_PyPGx NA18980_PyPGx NA19213_PyPGx NA12813_PyPGx NA19003_PyPGx NA10831_PyPGx NA18524_PyPGx NA10851_PyPGx NA18966_PyPGx HG00589_PyPGx NA18855_PyPGx NA18544_PyPGx NA18518_PyPGx NA18973_PyPGx NA19143_PyPGx NA18992_PyPGx NA12873_PyPGx NA19207_PyPGx NA18942_PyPGx NA19178_PyPGx NA19789_PyPGx NA19122_PyPGx NA19174_PyPGx NA18868_PyPGx HG00436_PyPGx HG00276_PyPGx NA19239_PyPGx NA19109_PyPGx NA20509_PyPGx NA10854_PyPGx NA19226_PyPGx NA10847_PyPGx NA18552_PyPGx NA18526_PyPGx NA07029_PyPGx NA06991_PyPGx NA11832_PyPGx NA21781_PyPGx NA12145_PyPGx NA19007_PyPGx NA18861_PyPGx NA12156_PyPGx NA18952_PyPGx NA18565_PyPGx NA19920_PyPGx NA12003_PyPGx NA20296_PyPGx NA07019_PyPGx NA07056_PyPGx NA11993_PyPGx NA19147_PyPGx NA19819_PyPGx NA07000_PyPGx NA18540_PyPGx NA19095_PyPGx NA18509_PyPGx NA19917_PyPGx NA18617_PyPGx NA07357_PyPGx NA19176_PyPGx NA18959_PyPGx NA07348_PyPGx NA18564_PyPGx NA19908_PyPGx NA11839_PyPGx NA12717_PyPGx

chr1 47261780 . T C 235.707 PASS DP=1519;VDB=0.326231;SGB=-40.8249;RPBZ=0.398415;MQBZ=-15.2308;MQSBZ=0.889911;BQBZ=-10.8447;SCBZ=0.105486;FS=0;MQ0F=0;AC=120;AN=140;DP4=205,13,1153,122;MQ=49 GT:PL:AD 0/0:0,57,255:19,0 0/1:204,0,172:10,11 1/1:240,45,0:0,15 0/1:147,0,165:11,10 1/1:246,54,0:0,18 1/1:255,66,0:0,22 0/1:134,0,182:15,9 1/1:255,87,0:0,29 1/1:231,54,0:0,18 1/1:224,57,0:0,19 1/1:248,36,0:0,12 0/1:120,0,176:9,7 1/1:255,54,0:0,18 1/1:198,75,0:0,25 0/1:168,0,127:7,12 1/1:255,57,0:0,19 0/1:105,0,183:9,5 1/1:223,51,0:0,17 1/1:255,63,0:0,21 1/1:255,80,0:1,31 1/1:189,60,0:0,20 0/1:148,0,214:10,12 1/1:191,45,0:0,15 0/1:98,0,175:15,6 1/1:255,69,0:0,23 0/1:158,0,100:7,16 0/1:161,0,114:5,12 0/1:255,0,138:9,14 1/1:247,81,0:0,27 1/1:227,57,0:0,19 1/1:255,63,0:0,21 1/1:255,69,0:0,23 1/1:255,75,0:0,25 1/1:255,84,0:0,28 0/1:202,0,190:14,15 1/1:224,69,0:0,23 1/1:255,66,0:0,22 1/1:255,63,0:0,21 1/1:255,39,0:0,13 1/1:255,51,0:0,17 1/1:255,72,0:0,24 1/1:231,63,0:0,21 1/1:255,78,0:0,26 1/1:255,75,0:0,25 0/1:145,0,227:16,10 1/1:200,72,0:0,24 1/1:205,72,0:0,24 1/1:207,66,0:0,22 0/1:109,0,172:12,8 0/1:174,0,135:9,14 1/1:255,66,0:0,22 1/1:255,45,0:0,15 1/1:249,54,0:0,18 1/1:255,54,0:0,18 1/1:230,72,0:0,24 1/1:247,63,0:0,21 1/1:211,81,0:0,27 1/1:255,54,0:0,18 0/1:167,0,193:13,13 1/1:255,72,0:0,24 0/1:76,0,159:11,4 1/1:236,66,0:0,22 1/1:255,78,0:0,26 1/1:218,45,0:0,15 1/1:255,60,0:0,20 1/1:255,66,0:0,22 1/1:202,78,0:0,26 1/1:255,81,0:0,27 0/1:181,0,176:16,11 1/1:231,33,0:0,11

I have also completed the tutorial with the given files successfully ( Only needed to re-index the files which can be done easily ) Thank you for this comprehended tutorial as well. Much appreciated, Have a wonderful day, Evyatar