griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
137 stars 59 forks source link

ERROR: There was a mismatch between the actual wildtype amino acid sequence (G) and the expected amino acid sequence (A). #1031

Closed PierreLaplante closed 8 months ago

PierreLaplante commented 11 months ago

Installation Type

Docker

pVACtools Version / Docker Image

griffithlab/pvactools:4.0.5

Python Version

3.10.10

Operating System

CentOs 7 HPC

Describe the bug

Hello, When I run pvacseq run on my samples, one of them return :

ERROR: There was a mismatch between the actual wildtype amino acid sequence (G) and the expected amino acid sequence (A). Did you use the same reference build version for VEP that you used for creating the VCF? OrderedDict([('chromosome_name', '1'), ('start', '170987628'), ('stop', '170987629'), ('reference', 'C'), ('variant', 'G'), ('gene_name', 'Mpz'), ('transcript_name', 'NM_001315500.1'), ('transcript_support_level', 'Not Supported'), ('transcript_length', '311'), ('biotype', 'protein_coding'), ('amino_acid_change', 'A/G'), ('codon_change', 'gCg/gGg'), ('ensembl_gene_id', '17528'), ('hgvsc', 'NM_001315500.1:c.764C>G'), ('hgvsp', 'NP_001302429.1:p.Ala255Gly'), ('wildtype_amino_acid_sequence', 'MAPGAPSSSPSPILAALLFSSLVLSPALAIVVYTDREIYGAVGSQVTLHCSFWSSEWVSDDISFTWRYQPEGGRDAISIFHYAKGQPYIDEVGTFKERIQWVGDPRWKDGSIVIHNLDYSDNGTFTCDVKNPPDIVGKTSQVTLYVFEKVPTRYGVVLGAVIGGILGVVLLLLLLFYLIRYCWLRRQAALQRRLSAMEKGRFHKSSKDSSKRGRQTPVLYAMLDHSRSTKAASEKKSKGLGESRKDKKRLAGRAGGRGSATESSKGSQVVVIEMELRKDEQSSELRPAVKSPSRTSLKNALKNMMGLDSDK'), ('frameshift_amino_acid_sequence', ''), ('fusion_amino_acid_sequence', ''), ('variant_type', 'missense'), ('protein_position', '255'), ('transcript_expression', 'NA'), ('gene_expression', '0.1250423952889849'), ('normal_depth', 'NA'), ('normal_vaf', 'NA'), ('tdna_depth', '39'), ('tdna_vaf', '0.217'), ('trna_depth', '0'), ('trna_vaf', '0.0'), ('index', '174.Mpz.NM_001315500.1.missense.255A/G'), ('protein_length_change', ''), ('fusion_read_support', 'NA'), ('fusion_expression', 'NA')])

I understand the error message but I'm not sure how it could have been since I have ran everything in batches, and my other samples don't elect the same error?

I am lost as to where to start to fix this error.

Thank you for your time and help.

How to reproduce this bug

singularity exec \
    --mount type=bind,src=/mnt/beegfs/scratch/p_laplante,dst=/mnt \
    pvac4.sif pvacseq run \
    /mnt/VEP_annotated_coveraged_VCF/12_2_Msh2KO_tumor_only_twicefiltered_T.vcf.gz.PASS_stFILT.vcf.gz.nogerm.vcf.gz_dec.vcf.gz.mm10.vcf.gz.sorted.vcf.gz.readcount.vcf.gz.mm39.vcf.gz.sorted.vcf.gz.VEP_anno.gx.vcf.gz \
    12_2_Msh2KO \
    H-2-Kd,H-2-Dd,H-2-Ld \
    NetMHCpan \
    /mnt/output_pvactools3/12_2_Msh2KO \
    --phased-proximal-variants-vcf /mnt/VEP_annotated_coveraged_VCF/phased/sorted/12_2_Msh2KO_phased.vcf.mm10.vcf.mm39.vcf.VEP_anno.vcf.gz.sorted.vcf.gz \
    --iedb-install-directory /opt/iedb \

Input files

Here is the offending vcf/tbi :

offending_vcf.gz_and_offending_vcf.tbi.zip

Here is the offending phased_vcf/tbi (on wetransfer because it is too large for github) : https://wetransfer.com/downloads/98ce8bb20baa79e8ea3efa0ae68cf2a420231009134822/466f1637f4a874cc57313d18ad94211d20231009134835/7d2ff8

Log output

ERROR: There was a mismatch between the actual wildtype amino acid sequence (G) and the expected amino acid sequence (A). Did you use the same reference build version for VEP that you used for creating the VCF? OrderedDict([('chromosome_name', '1'), ('start', '170987628'), ('stop', '170987629'), ('reference', 'C'), ('variant', 'G'), ('gene_name', 'Mpz'), ('transcript_name', 'NM_001315500.1'), ('transcript_support_level', 'Not Supported'), ('transcript_length', '311'), ('biotype', 'protein_coding'), ('amino_acid_change', 'A/G'), ('codon_change', 'gCg/gGg'), ('ensembl_gene_id', '17528'), ('hgvsc', 'NM_001315500.1:c.764C>G'), ('hgvsp', 'NP_001302429.1:p.Ala255Gly'), ('wildtype_amino_acid_sequence', 'MAPGAPSSSPSPILAALLFSSLVLSPALAIVVYTDREIYGAVGSQVTLHCSFWSSEWVSDDISFTWRYQPEGGRDAISIFHYAKGQPYIDEVGTFKERIQWVGDPRWKDGSIVIHNLDYSDNGTFTCDVKNPPDIVGKTSQVTLYVFEKVPTRYGVVLGAVIGGILGVVLLLLLLFYLIRYCWLRRQAALQRRLSAMEKGRFHKSSKDSSKRGRQTPVLYAMLDHSRSTKAASEKKSKGLGESRKDKKRLAGRAGGRGSATESSKGSQVVVIEMELRKDEQSSELRPAVKSPSRTSLKNALKNMMGLDSDK'), ('frameshift_amino_acid_sequence', ''), ('fusion_amino_acid_sequence', ''), ('variant_type', 'missense'), ('protein_position', '255'), ('transcript_expression', 'NA'), ('gene_expression', '0.1250423952889849'), ('normal_depth', 'NA'), ('normal_vaf', 'NA'), ('tdna_depth', '39'), ('tdna_vaf', '0.217'), ('trna_depth', '0'), ('trna_vaf', '0.0'), ('index', '174.Mpz.NM_001315500.1.missense.255A/G'), ('protein_length_change', ''), ('fusion_read_support', 'NA'), ('fusion_expression', 'NA')])

Output files

No response

susannasiebert commented 9 months ago

There is a more in-depth explanation of this error in our documentation. We usually recommend running the ref-transcript-mismatch reporter to remove variants like this (using the --hard option).

susannasiebert commented 8 months ago

Closing this issue due to inactivity.