Closed axelgschwind closed 1 year ago
Hi @axelgschwind,
Thank you for the bug report. This seems to actually be a problem with the GX header definition in your VCF. It is set to Number=1
, instead of Number=.
. The latter would indicate to the VCF parser that this field is a list and if isinstance(expressions, list)
would evaluate to True
. I manually edited your VCF to fix this header and confirmed that the error no longer occurs.
Did you use the vcf-expression-annotator to create this file? I checked the source code for the vcf-expression-annotator and the header is supposed to be written as Number=.
so I'm not sure why it's using the wrong format in your case. Can you please confirm which version of vatools you were using? Also, if you could attach the input files you provided for the vcf-expression-annotator and the exact command you ran, that would be great so I can try to replicate this on my end.
Hi @susannasiebert ,
thank you very much for your help! I had written a small script annotating my VCFs. The script caused this small error in the VCF header. But with your help I could fix the error and solve the problem.
Therefore, I'm closing this issue now.
Best,
Axel
Installation Type
Standalone
pVACtools Version / Docker Image
pvactools 3.1.3 / pvactools 4.0.0
Python Version
python 3.8
Operating System
Ubuntu 20.04.5 LTS
Describe the bug
Hi!
I annotated a somatic VCF file as described in the documentary. I added gene expression information in terms of gene ids. One SNV can affect multiple genes. Thus, the gene expression should be annotated as comma-separated values, as described in the documentation.
However, if one SNV has multiple genes and I annotate multiple comma-separated gene expression values, I get an error as copied into the log output.
I traced the problem to the file "input_file_converter.py". The algorithm of the method "execute(self) tries to unpack the expression values into a tuple of variables. However, it does not go into a loop (which would be necessary in the case of multiple gene_ids, as can also be seen in the case of transcript-based annotation):
My current workaround is to just use the first coding gene as a reference for the expression. However, I think it is desirable to fix the code or the documentation.
I attached a minimum not-working input VCF.
Cheers,
Axel
How to reproduce this bug
Input files
bug_report.vcf.gz
Log output
Executing MHC Class I predictions Converting .vcf to TSV TSV file already exists. Skipping. Converting VCF to TSV /mnt/storage2/users/ahgscha1/phd/tools/pVACseq/.env/lib/python3.8/site-packages/vcfpy/parser.py:251: CannotConvertValue: 0/1 cannot be converted to Float, keeping as string. warnings.warn( Traceback (most recent call last): File "/mnt/storage2/users/ahgscha1/phd/tools/pVACseq/.env/bin/pvacseq", line 8, in
sys.exit(main())
File "/mnt/storage2/users/ahgscha1/phd/tools/pVACseq/.env/lib/python3.8/site-packages/pvactools/tools/pvacseq/main.py", line 116, in main
args[0].func.main(args[1])
File "/mnt/storage2/users/ahgscha1/phd/tools/pVACseq/.env/lib/python3.8/site-packages/pvactools/tools/pvacseq/run.py", line 133, in main
pipeline.execute()
File "/mnt/storage2/users/ahgscha1/phd/tools/pVACseq/.env/lib/python3.8/site-packages/pvactools/lib/pipeline.py", line 424, in execute
self.generate_combined_fasta(self.fasta_file_path())
File "/mnt/storage2/users/ahgscha1/phd/tools/pVACseq/.env/lib/python3.8/site-packages/pvactools/lib/pipeline.py", line 139, in generate_combined_fasta
generate_combined_fasta.main(params)
File "/mnt/storage2/users/ahgscha1/phd/tools/pVACseq/.env/lib/python3.8/site-packages/pvactools/tools/pvacseq/generate_protein_fasta.py", line 161, in main
proximal_variants_tsv = convert_vcf(args.input_vcf, temp_dir, args.sample_name, args.phased_proximal_variants_vcf, args.flanking_sequence_length, args.pass_only)
File "/mnt/storage2/users/ahgscha1/phd/tools/pVACseq/.env/lib/python3.8/site-packages/pvactools/tools/pvacseq/generate_protein_fasta.py", line 90, in convert_vcf
converter.execute()
File "/mnt/storage2/users/ahgscha1/phd/tools/pVACseq/.env/lib/python3.8/site-packages/pvactools/lib/input_file_converter.py", line 421, in execute
(item, value) = expressions.split('|')
ValueError: too many values to unpack (expected 2)
make: *** [Makefile:18: create_bug_report] Error 1
Output files
No response