griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
141 stars 59 forks source link

Failed to compile plugin Wildtype: Excessively long <> operator atWildtype.pm line 20. #705

Closed asmlgkj closed 2 years ago

asmlgkj commented 3 years ago

thanks a lot

Describe the bug Failed to compile plugin Wildtype: Excessively long <> operator at /data/database/anno/vep104/VEP_plugins-release-104/Wildtype.pm line 20. <> se of uninitialized value in addition (+) at Frameshift.pm line 129, <$fh> line 281368

To Reproduce docker run --rm --user id -u:id -g -v /home/DATA/kobe:/data docker.io/ensemblorg/ensembl-vep vep --input_file /data/FL202109054CASE.filter123.vcf.gz --output_file /data/FL202109054CASE.filter123.vcf.gz_vep --dir_cache /data/database/anno/vep104 --dir_plugins /data/database/anno/vep104/VEP_plugins-release-104 --fasta /data/database/anno/vep104/Homo_sapiens.GRCh37.dna.primary_assembly_chr.fa --offline --cache --force_overwrite --transcript_version --refseq --assembly GRCh37 --format vcf --cache_version 104 --keep_csq --variant_class --vcf --sift b --polyphen b --ccds --hgvs --symbol --numbers --canonical --gene_phenotype --af_1kg --af_esp --af_gnomad --pubmed --var_synonyms --variant_class --fork 4 --check_existing --phased --numbers --xref_refseq --plugin Frameshift --plugin Wildtype --tsl --terms SO Log Output

areword found where operator expected at /data/database/anno/vep104/VEP_plugins-release-104/Wildtype.pm line 8, near ""en" data"
    (Missing operator before data?)
Bareword found where operator expected at /data/database/anno/vep104/VEP_plugins-release-104/Wildtype.pm line 8, near ""auto" data"
    (Missing operator before data?)
Bareword found where operator expected at /data/database/anno/vep104/VEP_plugins-release-104/Wildtype.pm line 8, near ""light" data"
    (Missing operator before data?)
WARNING: Failed to compile plugin Wildtype: Excessively long <> operator at /data/database/anno/vep104/VEP_plugins-release-104/Wildtype.pm line 20.
Compilation failed in require at (eval 41) line 2.
BEGIN failed--compilation aborted at (eval 41) line 2.

2021-09-20 14:37:53 - INFO: BAM-edited cache detected, enabling --use_transcript_ref; use --use_given_ref to override this
WARNING: 134 : Use of uninitialized value in numeric lt (<) at /data/database/anno/vep104/VEP_plugins-release-104/Frameshift.pm line 129, <$fh> line 262216.
Use of uninitialized value in numeric lt (<) at /data/database/anno/vep104/VEP_plugins-release-104/Frameshift.pm line 129, <$fh> line 262216.
Use of uninitialized value in addition (+) at /data/database/anno/vep104/VEP_plugins-release-104/Frameshift.pm line 129, <$fh> line 262216.
WARNING: 190 : Use of uninitialized value in numeric lt (<) at /data/database/anno/vep104/VEP_plugins-release-104/Frameshift.pm line 129, <$fh> line 281368.
Use of uninitialized value in numeric lt (<) at /data/database/anno/vep104/VEP_plugins-release-104/Frameshift.pm line 129, <$fh> line 281368.
Use of uninitialized value in addition (+) at /data/database/anno/vep104/VEP_plugins-release-104/Frameshift.pm line 129, <$fh> line 281368.
Use of uninitialized value in numeric lt (<) at /data/database/anno/vep104/VEP_plugins-release-104/Frameshift.pm line 129, <$fh> line 281368.
Use of uninitialized value in numeric lt (<) at /data/database/anno/vep104/VEP_plugins-release-104/Frameshift.pm line 129, <$fh> line 281368.
Use of uninitialized value in addition (+) at /data/database/anno/vep104/VEP_plugins-release-104/Frameshift.pm line 129, <$fh> line 281368.
WARNING: VCF line on line 67325 looks incomplete, skipping:
chr22   42523635    .   G   <DEL>   177 PASS    STATUS=StrongLOH;SAMPLE=FL202109054;TYPE=DEL;DP=1002;VD=0;AF=0;SHIFT3=0;MSI=0;MSILEN=0;SSF=0;SOR=0;LSEQ=TCGATCTCCTGTTGGACACG;RSEQ=GGGATGTCATATGGGTCACA  GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM    0/0:1002:0:0,0:330,550:880,0:0:0:0:0:0:0:1:0:0:0:0:0:0  0/1:415:28:14,14:188,171:359,28:0.0675:2,2:75:1:37:1:0.84624:1.09915:39.6:56:0.0636:0.0675:0.1

Output File

image

asmlgkj commented 3 years ago

image

susannasiebert commented 3 years ago

I believe that this issue is encountered when the plugin files (in your case Wildtype.pm) are downloaded as html. Did you use the pvacseq install_vep_plugin command to download the plugin files or did you download from GitHub manually? Please ensure you are downloading the raw file (https://raw.githubusercontent.com/griffithlab/pVACtools/master/tools/pvacseq/VEP_plugins/Wildtype.pm). If this doesn't resolve your issue, please attach the Wildtype.pm you are using for further debugging.

Since VEP was unable to compile your Wildtype.pm file, it did not complete annotation with that plugin which is why you are seeing the pVACseq error in your second screenshot.

asmlgkj commented 3 years ago

thnasthe wildtype.pm now is ok, but frameshit still not work, my vcf is from vardict image

asmlgkj commented 3 years ago

thanks a lot, after I ignore the warning, run command docker run --rm --user id -u:id -g -v /home/DATA/kobe:/data docker.io/griffithlab/pvactools pvacseq run /data/FL202109054CASE.filter123_vep.vcf_line_no_dot.filter.vcf_line_new FL202109054 HLA-A02:01,HLA-B35:01 MHCflurry NetMHCpan NetMHCIIpan /data -e1 8,9,10,11 --iedb-install-directory /opt/iedb -t 8 --tdna-vaf 0.05 --normal-sample-name FL202109054_N -k --downstream-sequence-length 1000

another strange thing comes, image

due to the vep annotate vcf is too big, I just grep -e '#' -e 'frameshift_variant' > test.vcf and change the name to a zip just for uploding to github, you can just rename it with .vcf test.ZIP

### this is the origin vcf used for vep anno FL202109054CASE.filter123.vcf.gz

asmlgkj commented 3 years ago

this is the pm I used, rename it just for uploading, thanks a lot Frameshift.zip

susannasiebert commented 3 years ago

Taking a quick glance at your VCF, it looks like the variants in your VCF aren't actually called in your tumor sample, i.e. they have a 0/0 genotype. pVACseq only processes variants that were actually observed in your sample of interest.

asmlgkj commented 3 years ago

vardict output this, it has 0/0, 0/1, 1/1

susannasiebert commented 3 years ago
zgrep -v '#' ~/Downloads/FL202109054CASE.filter123.vcf.gz | cut -f 10 | cut -f 1 -d : | sort | uniq -c
70590 0/0

As you can see, all of the variants in the VCF you provided are homozygous reference in the tumor sample, i.e. the variants are not observed in the tumor sample. They cannot be processed by pVACseq.

asmlgkj commented 3 years ago

so pvactools use what genotype variant, does it first filter out the 0/0 in the tumor sample variants?

in the code https://github.com/genome/analysis-workflows/blob/master/definitions/pipelines/detect_variants.cwl I does not find vardict vcf, but combine: run: ../tools/combine_variants.cwl in: reference: reference mutect_vcf: mutect/filtered_vcf strelka_vcf: strelka/filtered_vcf varscan_vcf: varscan/filtered_vcf pindel_vcf: pindel/filtered_vcf out: [combined_vcf]

does it mean vardict output has not been tested? <> if I just replace all the 0/0 in tumor sample(tumor-only mode or pair mode) sed '/^chr/s#0/0#0/1#' a.vcf < b.vcf will pvactools use all the variants? <> hope for the answer thanks a lot

asmlgkj commented 3 years ago

any comment about this @susannasiebert thanks a lot

susannasiebert commented 3 years ago

I don't believe we've tried vardict. @jasonwalker80 @malachig can you confirm?

Editing the GT field the way you are doing should work but I think it will edit both the tumor and the sample genotypes. If you want to preserve the existing information you can also add a third, dummy sample to your VCF using the VAtools vcf-genotype-annotator (https://vatools.readthedocs.io/en/latest/vcf_genotype_annotator.html). You will need to redo any work you have done to add readcounts and expression information to your tumor sample and add them to the new dummy sample instead.

I don't know how you are planning on using the results from pVACseq but I would be very careful with any changes like this. You don't want to predict epitopes for variants that don't actually occur in a patient as vaccination with them will at best not work and at worst elicit a response against normal cells.

asmlgkj commented 3 years ago

thanks a lot, what is your meaning of edit both the tumor and the sample genotypes., how to edit? <> if I just replace all the 0/0 in tumor sample(tumor-only mode or pair mode) sed '/^chr/s#0/0#0/1#' a.vcf < b.vcf will pvactools use all the variants? I am here wanting to know how pvacseq select variants by GT, is there any concrete filter infomation

susannasiebert commented 3 years ago

I apologize. I meant to say that your sed command will replace 0/0 in both the tumor and the normal sample. If you're ok with that then your sed command will work. pVACseq accept any variant that's not homozygous reference (0/0). 0/1, 1/0, and 1/1 genotypes will all be processed.

asmlgkj commented 3 years ago

@susannasiebert thanks a lot. pVACseq accept any variant that's not homozygous reference (0/0) . here means in the tumor or normal GT, or for both must be 0/1. by the way, do you know what 1/0 mean? I really can not find reference about 1/0

susannasiebert commented 3 years ago

Only the tumor sample needs to be called. The genotype of the normal sample is not taken into account.

I don't know what 1/0 means. It might just be a chromsome-aware representation of 0/1 but that would be weird without phasing soI really don't know for sure.

malachig commented 3 years ago

Normally the order of the alleles (1|0 vs 0|1) is used to define phasing. A haplotype is built by combining all alleles in the first column and all alleles in the second column. Normally phased alleles are indicated by separation with a "|" instead of a "/". An example of "1|0" is shown in the first page of the VCF spec:

https://samtools.github.io/hts-specs/VCFv4.2.pdf

Here you have "1/0" though. Maybe it still relates to phasing but from a tool process that does not use the "|"? Just speculation. How was your VCF created out of curiousity?

On Thu, Sep 30, 2021 at 9:09 AM Susanna Kiwala @.***> wrote:

Only the tumor sample needs to be called. The genotype of the normal sample is not taken into account.

I don't know what 1/0 means. It might just be a read-aware representation of 0/1 but I really don't know for sure.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/griffithlab/pVACtools/issues/705#issuecomment-931357688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGRFGC46LFRIVC6CWXXZALUERVQHANCNFSM5EMFYCCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

asmlgkj commented 3 years ago

Only the tumor sample needs to be called. The genotype of the normal sample is not taken into account.

I don't know what 1/0 means. It might just be a chromsome-aware representation of 0/1 but that would be weird without phasing soI really don't know for sure. so sed '/^chr/s#0/0#0/1#' still works for the variants that I want to pass to pvacseq, because this command will not change 1/0 0/1 1/1 in the tumor GT , but just replaced 0/0 ,am I right? thanks a lot

susannasiebert commented 3 years ago

so sed '/^chr/s#0/0#0/1#' still works for the variants that I want to pass to pvacseq, because this command will not change 1/0 0/1 1/1 in the tumor GT , but just replaced 0/0 ,am I right?

Like I said previously, yes, it should work.

asmlgkj commented 3 years ago

Normally the order of the alleles (1|0 vs 0|1) is used to define phasing. A haplotype is built by combining all alleles in the first column and all alleles in the second column. Normally phased alleles are indicated by separation with a "|" instead of a "/". An example of "1|0" is shown in the first page of the VCF spec: https://samtools.github.io/hts-specs/VCFv4.2.pdf Here you have "1/0" though. Maybe it still relates to phasing but from a tool process that does not use the "|"? Just speculation. How was your VCF created out of curiousity? On Thu, Sep 30, 2021 at 9:09 AM Susanna Kiwala @.***> wrote: Only the tumor sample needs to be called. The genotype of the normal sample is not taken into account. I don't know what 1/0 means. It might just be a read-aware representation of 0/1 but I really don't know for sure. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#705 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGRFGC46LFRIVC6CWXXZALUERVQHANCNFSM5EMFYCCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Thanks a lot. it is a vcf generated by tool vardict, I do not know the inner phased method. I have no idea about what you say ' How was your VCF created out of curiousity?'. because english is not my first language, I am afraid of misunderstanding your question

asmlgkj commented 3 years ago

so sed '/^chr/s#0/0#0/1#' still works for the variants that I want to pass to pvacseq, because this command will not change 1/0 0/1 1/1 in the tumor GT , but just replaced 0/0 ,am I right?

Like I said previously, yes, it should work. Thanks a lot

susannasiebert commented 3 years ago

@malachig I believe this VCF was created using VarDict as the variant caller.

malachig commented 3 years ago

Some of the developers talking about this exact question: https://github.com/AstraZeneca-NGS/VarDict/issues/34

On Thu, Sep 30, 2021 at 9:50 AM Susanna Kiwala @.***> wrote:

@malachig https://github.com/malachig I believe this VCF was created using VarDict https://github.com/AstraZeneca-NGS/VarDict as the variant caller.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/griffithlab/pVACtools/issues/705#issuecomment-931394382, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGRFGD3VLEFTAXJUFAO7JLUER2NHANCNFSM5EMFYCCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

asmlgkj commented 3 years ago

its GT is very strange, taged as strongmatic the GT in tumor is also 0/0, here is some record, <> <>

chr1 2488079 . C T 115 f0.02;p8 STATUS=StrongSomatic;SAMPLE=HH0321091801;TYPE=SNV;DP=559;VD=6;AF=0.0107;SHIFT3=0;MSI=1.000;MSILEN=1;SSF=0.09318;SOR=Inf;LSEQ=CATCCTGCTAGCTGGGTTCC;RSEQ=GAGCTGCCGGTCTGAGCCTG GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/0:559:6:6,0:519,34:553,6:0.0107:2,0:6:1:44.8:1:1:0:60:12:0.0107:0.0089:1.3 0/0:270:0:0,0:156,114:270,0:0:2,0:43.7:1:36.3:1:1:0:60:89:1:0:0.5 chr1 2488170 . C A 104 f0.02 STATUS=StrongSomatic;SAMPLE=HH0321091801;TYPE=SNV;DP=643;VD=5;AF=0.0078;SHIFT3=0;MSI=1.000;MSILEN=1;SSF=0.20046;SOR=Inf;LSEQ=CCAAAACCGACGTCTTGAGG;RSEQ=TGGTGAGCCCCCGAGCCTCC GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/0:643:5:0,5:128,510:638,5:0.0078:2,0:9.4:1:45:1:0.5888:0:60:10:0.0078:0.0062:2.6 0/0:243:0:0,0:89,154:243,0:0:2,0:39.6:1:36.6:1:1:0:60:120.5:1:0:0.7 chr1 2489179 . C A 45 v3;f0.02 STATUS=StrongSomatic;SAMPLE=HH0321091801;TYPE=SNV;DP=427;VD=2;AF=0.0047;SHIFT3=0;MSI=2.000;MSILEN=1;SSF=0.41822;SOR=Inf;LSEQ=CCTTAGGTGCTGTATCTCAC;RSEQ=TTCCTGGGAGCCCCCTGCTA GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/0:427:2:2,0:240,185:425,2:0.0047:2,0:21.5:1:45:1:0.50776:0:60:4:0.0047:0.0023:1 0/0:233:0:0,0:129,104:233,0:0:2,0:40.7:1:36.2:1:1:0:60:45.6:1:0:0.2 chr1 2489200 . C T 45 v3;f0.02 STATUS=StrongSomatic;SAMPLE=HH0321091801;TYPE=SNV;DP=449;VD=2;AF=0.0045;SHIFT3=0;MSI=1.000;MSILEN=1;SSF=0.4111;SOR=Inf;LSEQ=TTCCTGGGAGCCCCCTGCTA;RSEQ=GCCCCAGCTCTGCCGTCCTG GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/0:449:2:1,1:213,234:447,2:0.0045:2,2:39:1:45:0:1:1.10:60:4:0.0045:0:1 0/0:251:0:0,0:120,131:251,0:0:2,0:39.7:1:36.5:1:1:0:60:250:1:0:0.2 chr1 2489235 . AC A 45 v3;f0.02 STATUS=StrongSomatic;SAMPLE=HH0321091801;TYPE=Deletion;DP=410;VD=2;AF=0.0049;SHIFT3=2;MSI=3.000;MSILEN=1;SSF=0.3879;SOR=Inf;LSEQ=GTCCTGCAAGGAGGACGAGT;RSEQ=CCAGTGGGCTCCGAGTGCTG GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/0:410:2:0,2:146,262:408,2:0.0049:2,0:17:1:45:1:0.54029:0:60:4:0.0049:0.0024:1 0/0:248:0:0,0:104,144:248,0:0:2,0:40.5:1:36.4:1:1:0:60:81.667:1:0:0.2 chr1 2489837 . C T 70 f0.02 STATUS=StrongSomatic;SAMPLE=HH0321091801;TYPE=SNV;DP=287;VD=3;AF=0.0105;SHIFT3=0;MSI=3.000;MSILEN=1;SSF=0.19171;SOR=Inf;LSEQ=GGCACAGTGTGTGAACCCTG;RSEQ=CCTCCAGGCACCTACATTGC GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/0:287:3:3,0:126,158:284,3:0.0105:2,0:13.7:1:44.7:1:0.08964:0:60:6:0.0105:0.007:2.3 0/0:210:0:0,0:110,100:210,0:0:2,0:40.9:1:36:1:1:0:60:41:1:0:0.3 chr1 2489914 . G A 45 v3;f0.02 STATUS=StrongSomatic;SAMPLE=HH0321091801;TYPE=SNV;DP=265;VD=2;AF=0.0075;SHIFT3=0;MSI=2.000;MSILEN=1;SSF=0.33134;SOR=Inf;LSEQ=AATGTGTGACCCAGGTAAGA;RSEQ=GCCAGCACAGCCGGCCCAGC GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/0:265:2:1,1:107,156:263,2:0.0075:2,2:25:1:45:1:1:1.46:60:4:0.0075:0.0038:1.5 0/0:195:0:0,0:85,110:195,0:0:2,0:36.3:1:36.3:1:1:0:60:64:1:0:0.2

susannasiebert commented 3 years ago

@asmlgkj Which version of VarDict are you using?

asmlgkj commented 3 years ago

@asmlgkj Which version of VarDict are you using? thanks a lot the latest https://github.com/AstraZeneca-NGS/VarDictJava/releases/tag/v1.8.2. I also git clone the github and compile, it is also the same