openvax / topiary

Predict mutated T-cell epitopes from sequencing data
Apache License 2.0
27 stars 9 forks source link

Is there a way to pass in the reference/release when it can't be infered #11

Closed tavinathanson closed 8 years ago

tavinathanson commented 9 years ago

I'm using the following dummy VCF:

##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
20  61795   .   G   T   .   PASS    DP=81;SS=1;SSC=2;GPV=4.6768E-16;SPV=5.4057E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:44:22:22:50%:16,6,9,13    0/1:.:37:18:19:51.35%:10,8,10,9
20  62731   .   C   A   .   PASS    DP=68;SS=1;SSC=1;GPV=1.4855E-11;SPV=7.5053E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:32:17:15:46.88%:9,8,9,6   0/1:.:36:21:15:41.67%:8,13,8,7
20  63799   .   C   T   .   PASS    DP=72;SS=1;SSC=7;GPV=3.6893E-16;SPV=1.8005E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:39:19:19:50%:8,11,11,8    0/1:.:33:12:21:63.64%:5,7,8,13
20  65288   .   G   T   .   PASS    DP=35;SS=1;SSC=0;GPV=7.8434E-5;SPV=8.2705E-1    GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:21:13:8:38.1%:4,9,0,8 0/1:.:14:10:4:28.57%:2,8,0,4
20  65900   .   G   A   .   PASS    DP=53;SS=1;SSC=0;GPV=1.5943E-31;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:26:0:26:100%:0,0,12,14    1/1:.:27:0:27:100%:0,0,15,12
20  66370   .   G   A   .   PASS    DP=66;SS=1;SSC=0;GPV=2.6498E-39;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:32:0:32:100%:0,0,11,21    1/1:.:34:0:34:100%:0,0,15,19
20  68749   .   T   C   .   PASS    DP=64;SS=1;SSC=0;GPV=4.1752E-38;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:23:0:23:100%:0,0,7,16 1/1:.:41:0:41:100%:0,0,21,20
20  69094   .   G   A   .   PASS    DP=25;SS=1;SSC=8;GPV=4.2836E-5;SPV=1.5657E-1    GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:12:8:4:33.33%:5,3,4,0 0/1:.:13:5:8:61.54%:3,2,6,2
20  69408   .   C   T   .   PASS    DP=53;SS=1;SSC=0;GPV=8.7266E-12;SPV=9.8064E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:27:9:18:66.67%:5,4,9,9    0/1:.:26:15:11:42.31%:6,9,4,7
20  75254   .   C   A   .   PASS    DP=74;SS=1;SSC=9;GPV=7.9203E-12;SPV=1.1567E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:34:22:11:33.33%:13,9,5,6  0/1:.:40:20:20:50%:5,15,14,6

And I get:

Traceback (most recent call last):
  File "/Users/tavi/.virtualenvs/utopia/bin/topiary", line 99, in <module>
    main()
  File "/Users/tavi/.virtualenvs/utopia/bin/topiary", line 59, in main
    variants = variant_collection_from_args(args)
  File "/Users/tavi/.virtualenvs/utopia/lib/python2.7/site-packages/topiary/args.py", line 59, in variant_collection_from_args
    variant_collections.append(varcode.load_vcf(vcf_path))
  File "/Users/tavi/.virtualenvs/utopia/lib/python2.7/site-packages/varcode/vcf.py", line 73, in load_vcf
    path,))
ValueError: Unable to infer reference genome for snv.vcf

It's not obvious to me how to work around that from the commandline.

timodonnell commented 8 years ago

The current master seems to support this with the --reference-name argument. E.g:

$ topiary --vcf test/data/CELSR1/vcfs/vcf_1.vcf --mhc-predictor netmhcpan --mhc-alleles "A:02:01,A:02:02" --output-csv /tmp/out.csv --reference-name grch37 --ic50-cutoff 1000 --percentile-cutoff 100
serge2016 commented 7 years ago

Now --reference-name grch37 --> --genome grch37