openvax / topiary

Predict mutated T-cell epitopes from sequencing data
Apache License 2.0
27 stars 9 forks source link

Added option to read transcript FPKM values from StringTie GTF file #40

Closed iskandr closed 8 years ago

iskandr commented 8 years ago

Implements long requested feature (https://github.com/hammerlab/topiary/issues/39) from @JPFinnigan.

Review on Reviewable

JPFinnigan commented 8 years ago

Hey Guys,

Not sure if this is helpful, but I thought that if it helped identify a potential bug it might be worth posting. I cloned this branch and tried to run w/ the same .gtf Alex used to test the additions to the PR and got the following error.

Topiary commandline arguments:
Namespace(ic50_cutoff=500.0, json_variant_files=[], maf=[], mhc_alleles='H2-Kb,H2-Db', mhc_alleles_file=None, mhc_epitope_lengths=[8, 9, 10, 11], mhc_predictor='netmhcpan', only_novel_epitopes=False, output_csv='/Users/johnfinnigan/Desktop/TEMP/Results/MTA/Tumor_B16.F1_0821/RNA/Tumor_B16.F1_0821_mutect.targets.pass.vcf.netmhcpan.RNA.csv', output_html=None, padding_around_mutation=None, percentile_cutoff=None, reference_name=None, rna_gene_fpkm_tracking_file=None, rna_min_gene_expression=0.0, rna_min_transcript_expression=0.1, rna_transcript_fpkm_gtf_file='/Users/johnfinnigan/Desktop/TEMP/Results/RNA/Tumor_B16.F1/Tumor_B16.F1_0821.115B/GTF/StringTie/HISAT2/Tumor_B16.F1_0821.115B.HISAT2.sorted.gtf', rna_transcript_fpkm_tracking_file=None, skip_variant_errors=False, variant=[], vcf=['/Users/johnfinnigan/Desktop/TEMP/Results/WES/Tumor_B16_F1_0821/ISMMS/VCF/MuTect/Tumor_B16_F1_0821.mutect.targets.pass.vcf'], wildtype_ligandome_directory=None)
INFO:root:Building MHC binding prediction type for alleles ['H-2-Kb', 'H-2-Db'] and epitope lengths [8, 9, 10, 11]
INFO:root:Skipping allele SLA-1-CHANGDA: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-HB01: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-HB02: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-HB03: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-HB04: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-LWH: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-TPK: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-YC: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-YDL01: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-YTH: Malformed MHC type 1
INFO:root:Skipping allele SLA-2-YDL02: Malformed MHC type 2
INFO:root:Skipping allele SLA-3-CDY: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-HB01: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-LWH: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-TPK: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YC: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YDL: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YDY01: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YDY02: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YTH: Malformed MHC type 3
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/topiary", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/scripts/topiary", line 64, in <module>
    main()
  File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/scripts/topiary", line 46, in main
    epitopes = predict_epitopes_from_args(args)
  File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/topiary/predict_epitopes.py", line 278, in predict_epitopes_from_args
    transcript_expression_dict = rna_transcript_expression_dict_from_args(args)
  File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/topiary/commandline_args.py", line 304, in rna_transcript_expression_dict_from_args
    args.rna_transcript_fpkm_tracking_file)
  File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/topiary/rna/gtf.py", line 47, in load_transcript_fpkm_dict_from_gtf
    column_converters={fpkm_column_name: float})
  File "/Users/johnfinnigan/Desktop/Utilities/Topiary/gtfparse/gtfparse/gtfparse/read_gtf.py", line 58, in read_gtf_as_dict
    if not exists(filename):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/genericpath.py", line 18, in exists
    os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

My command line input:

JPF-MBP:~ johnfinnigan$ python /Library/Frameworks/Python.framework/Versions/2.7/bin/topiary \
> --vcf ~/Desktop/TEMP/Results/WES/Tumor_B16_F1_0821/ISMMS/VCF/MuTect/Tumor_B16_F1_0821.mutect.targets.pass.vcf \
> --mhc-predictor netmhcpan \
> --mhc-alleles H2-Kb,H2-Db \
> --mhc-epitope-lengths 8,9,10,11 \
> --ic50-cutoff 500 \
> --rna-transcript-fpkm-gtf-file ~/Desktop/TEMP/Results/RNA/Tumor_B16.F1/Tumor_B16.F1_0821.115B/GTF/StringTie/HISAT2/Tumor_B16.F1_0821.115B.HISAT2.sorted.gtf \
> --rna-min-transcript-expression 0.1 \
> --output-csv ~/Desktop/TEMP/Results/MTA/Tumor_B16.F1_0821/RNA/Tumor_B16.F1_0821_mutect.targets.pass.vcf.netmhcpan.RNA.csv

Any ideas?

iskandr commented 8 years ago

Sorry @JPFinnigan, I was passing the wrong filename to the GTF parser. Try again?

tavinathanson commented 8 years ago

Looks good to me % minor questions. @JPFinnigan good find, I wouldn't have caught that error in my review.

tavinathanson commented 8 years ago

@iskandr LGTM