Closed evyamor closed 5 months ago
Hi @evyamor,
Could you please send me the input VCF file and the exact command line you used to replicate the error at sbstevenlee@gmail.com? It is difficult to solve the problem without seeing the data.
Hey, Thank you for the response. I sent the VCF file and the CLI to your email.
Much appreciated, Thank you, Evyatar
@evyamor,
Thank you so much for providing the requested information! It really helped me identify the issue. The problem was caused by the HLA contigs present in your input VCF. During the pre-processing step, PyPGx first checks whether the input VCF has the chr
string or not (e.g., chr2
vs. 2
). Your VCF file happens to have variants from the HLA contigs as the top records:
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT KU_01539
HLA-DRB1*03:01:01:01 7926 . C G 4.45 . AC=1;AF=0.5;AN=2;DP=21;FS=2.172;MQ=98.91;MQRankSum=-3.344;QD=0.96;ReadPosRankSum=2.106;SOR=1.609;FractionInformativeReads=1 GT:AD:AF:DP:F1R2:F2R1:GQ:PL:GP:PRI:SB:MB 0/1:16,5:0.238:21:7,0:9,5:4:37,0,24:4.4539,1.9373,28.946:0,34.77,37.77:7,9,1,4:8,8,4,1
...
chr1 19776 . A G 27.21 . AC=2;AF=1;AN=2;DP=1;FS=0;MQ=10;MQRankSum=0;QD=7.78;ReadPosRankSum=0;SOR=1.609;FractionInformativeReads=1 GT:AD:AF:DP:F1R2:F2R1:GQ:PL:GP:PRI:SB:MB 1/1:0,1:1:1:0,1:0,0:24:65,29,0:27.21,26.131,0.018882:0,34.77,37.77:0,0,0,1:0,0,0,1
...
When PyPGx saw this, it automatically determined that the input VCF file does not have the chr
string, when clearly your VCF file does. Therefore, I updated the fuc.pyvcf.has_chr_prefix
method (see the commit) in the 0.38.0-dev
branch to ignore the HLA contigs when determining whether the input VCF has the chr
string. I confirm that after this fix everything works as expected (see the PyPGx results for CYP2C9 I sent you via email, as an example).
At this point, you have two options: You can either use the development version of the fuc
package (i.e., 0.38.0-dev
) or you can remove the HLA variants from your VCF file. Both should work.
Thank you again for reporting the issue. Please let me know if you have any questions.
Best regards, Steven
Thank you very much for checking these files and fixing the necessary functions. I will remove the HLA contigs from my current VCF file as suggested. Your assistance has been invaluable and is most appreciated. Thank you again.
Best regards, Evyatar
Hello, when using an outside prepared VCF file ( .vcf.gz & .tbi ) and executing 'run-ngs-pipeline' I keep getting this type of error:
Traceback (most recent call last): File "/home/ubuntu/.local/bin/pypgx", line 8, in
sys.exit(main())
File "/home/ubuntu/.local/lib/python3.8/site-packages/pypgx/main.py", line 33, in main
commands[args.command].main(args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/pypgx/cli/run_ngs_pipeline.py", line 159, in main
pipeline.run_ngs_pipeline(
File "/home/ubuntu/.local/lib/python3.8/site-packages/pypgx/api/pipeline.py", line 230, in run_ngs_pipeline
imported_variants = utils.import_variants(gene, variants,
File "/home/ubuntu/.local/lib/python3.8/site-packages/pypgx/api/utils.py", line 1041, in import_variants
vf = pyvcf.VcfFrame.from_file(vcf, regions=region)
File "/home/ubuntu/.local/lib/python3.8/site-packages/fuc/api/pyvcf.py", line 2356, in from_file
s = slice(fn, regions)
File "/home/ubuntu/.local/lib/python3.8/site-packages/fuc/api/pyvcf.py", line 1394, in slice
for record in vcf.fetch(chrom, start, end):
File "pysam/libcbcf.pyx", line 4466, in pysam.libcbcf.VariantFile.fetch
File "pysam/libchtslib.pyx", line 683, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig
12
For all contigs. Note, all entries have 'chr' prefix, e.g. chr12 And the contigs themselves looks like
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig=
contig= ...........
Both in a VCF prepared with DRAGEN, and for comparison purposes also a generated VCF from a BAM file with PYPGX. I'm trying to understand what's causing this error and how to fix this issue In both VCFs the contigs and entries are written the same way.
Thanks for your time, Evyatar