openvax / vaxrank

Ranked vaccine peptides for personalized cancer immunotherapy
Apache License 2.0
53 stars 21 forks source link

ValueError: Invalid contig name 'chr14_GL000194v1_random' for reference 'GRCh38' #193

Open leldershaw opened 3 years ago

leldershaw commented 3 years ago

Vaxrank fails when it tries to find the contig 'chr14_GL000194v1_random' in the reference GRCh38, despite this being a valid contig name.

Command run: vaxrank --vcf /home/ubuntu/data/Sample_07/Sample_07_tumor_v_Sample_07_normal.combine_variants.phased.annotated.vcf --genome hg38 --download-reference-genome-data --bam /home/ubuntu/data/Sample_07-T-RNA/07-T-RNA_S13_R1_001.fastq.gz.subread.sorted.BAM --mhc-predictor netmhc --mhc-alleles HLA-A01:01,HLA-B08:01,HLA-C*07:01 --mhc-epitope-lengths 9 --padding-around-mutation 5 --vaccine-peptide-length 17 --output-ascii-report Mel5-vaccine-peptides-report.txt

Error output: 2021-05-13 15:37:10,781 - isovar.allele_reads:199 - INFO - Gathering reads for Variant(contig='Y', start=56875044, ref='T', alt='A', reference_na$ e='GRCh38')
2021-05-13 15:37:10,782 - isovar.allele_reads:203 - INFO - Gathering variant reads for variant Variant(contig='Y', start=56875044, ref='T', alt='$ ', reference_name='GRCh38') (chromosome = chrY, gene names = [])
2021-05-13 15:37:10,816 - isovar.locus_reads:312 - INFO - Found 0 reads overlapping locus chrY: 56875043-56875045
2021-05-13 15:37:10,820 - isovar.translation:466 - INFO - No supporting reads for variant Variant(contig='Y', start=56875044, ref='T', alt='A', re ference_name='GRCh38') 2021-05-13 15:37:10,822 - vaxrank.core_logic:246 - INFO - No protein sequences for Variant(contig='Y', start=56875044, ref='T', alt='A', reference _name='GRCh38') 2021-05-13 15:37:10,822 - isovar.allele_reads:199 - INFO - Gathering reads for Variant(contig='chr14_GL000194v1_random', start=53456, ref='C', alt ='A', reference_name='GRCh38') Traceback (most recent call last): File "/home/ubuntu/anaconda3/bin/vaxrank", line 8, in sys.exit(main()) File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/vaxrank/cli.py", line 389, in main data = ranked_variant_list_with_metadata(args) File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/vaxrank/cli.py", line 314, in ranked_variant_list_with_metadata variants_count_dict = core_logic.variant_counts() File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/vaxrank/core_logic.py", line 331, in variant_counts if variant in self.isovar_protein_sequence_dict: File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/vaxrank/core_logic.py", line 243, in isovar_protein_sequence_dict for variant, isovar_protein_sequences in protein_sequences_generator: File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/isovar/protein_sequences.py", line 255, in reads_generator_to_protein_sequences_generat or for (variant, overlapping_reads) in variant_and_overlapping_reads_generator: File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/isovar/allele_reads.py", line 275, in reads_overlapping_variants allele_reads = reads_overlapping_variant( File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/isovar/allele_reads.py", line 207, in reads_overlapping_variant variant.gene_names) File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/varcode/variant.py", line 435, in gene_names self._check_that_genome_has_contig() File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/varcode/variant.py", line 370, in _check_that_genome_has_contig raise ValueError("Invalid contig name '%s' for reference '%s'" % ( ValueError: Invalid contig name 'chr14_GL000194v1_random' for reference 'GRCh38'

doctorchenzx commented 1 year ago

Hi, I met the same problem with you, have you find a resolution?

iskandr commented 1 year ago

This is an alt contig and probably the underlying annotation tools (PyEnsembl + Varcode) don't support it.

I think the only immediate solution is to filter out alt contig variants or do alignment against canonical chromosomes only. I'll try to think more of the best path forward though