Closed iskandr closed 8 years ago
Hey Guys,
Not sure if this is helpful, but I thought that if it helped identify a potential bug it might be worth posting. I cloned this branch and tried to run w/ the same .gtf Alex used to test the additions to the PR and got the following error.
Topiary commandline arguments:
Namespace(ic50_cutoff=500.0, json_variant_files=[], maf=[], mhc_alleles='H2-Kb,H2-Db', mhc_alleles_file=None, mhc_epitope_lengths=[8, 9, 10, 11], mhc_predictor='netmhcpan', only_novel_epitopes=False, output_csv='/Users/johnfinnigan/Desktop/TEMP/Results/MTA/Tumor_B16.F1_0821/RNA/Tumor_B16.F1_0821_mutect.targets.pass.vcf.netmhcpan.RNA.csv', output_html=None, padding_around_mutation=None, percentile_cutoff=None, reference_name=None, rna_gene_fpkm_tracking_file=None, rna_min_gene_expression=0.0, rna_min_transcript_expression=0.1, rna_transcript_fpkm_gtf_file='/Users/johnfinnigan/Desktop/TEMP/Results/RNA/Tumor_B16.F1/Tumor_B16.F1_0821.115B/GTF/StringTie/HISAT2/Tumor_B16.F1_0821.115B.HISAT2.sorted.gtf', rna_transcript_fpkm_tracking_file=None, skip_variant_errors=False, variant=[], vcf=['/Users/johnfinnigan/Desktop/TEMP/Results/WES/Tumor_B16_F1_0821/ISMMS/VCF/MuTect/Tumor_B16_F1_0821.mutect.targets.pass.vcf'], wildtype_ligandome_directory=None)
INFO:root:Building MHC binding prediction type for alleles ['H-2-Kb', 'H-2-Db'] and epitope lengths [8, 9, 10, 11]
INFO:root:Skipping allele SLA-1-CHANGDA: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-HB01: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-HB02: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-HB03: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-HB04: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-LWH: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-TPK: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-YC: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-YDL01: Malformed MHC type 1
INFO:root:Skipping allele SLA-1-YTH: Malformed MHC type 1
INFO:root:Skipping allele SLA-2-YDL02: Malformed MHC type 2
INFO:root:Skipping allele SLA-3-CDY: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-HB01: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-LWH: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-TPK: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YC: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YDL: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YDY01: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YDY02: Malformed MHC type 3
INFO:root:Skipping allele SLA-3-YTH: Malformed MHC type 3
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/topiary", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/scripts/topiary", line 64, in <module>
main()
File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/scripts/topiary", line 46, in main
epitopes = predict_epitopes_from_args(args)
File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/topiary/predict_epitopes.py", line 278, in predict_epitopes_from_args
transcript_expression_dict = rna_transcript_expression_dict_from_args(args)
File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/topiary/commandline_args.py", line 304, in rna_transcript_expression_dict_from_args
args.rna_transcript_fpkm_tracking_file)
File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/topiary/rna/gtf.py", line 47, in load_transcript_fpkm_dict_from_gtf
column_converters={fpkm_column_name: float})
File "/Users/johnfinnigan/Desktop/Utilities/Topiary/gtfparse/gtfparse/gtfparse/read_gtf.py", line 58, in read_gtf_as_dict
if not exists(filename):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/genericpath.py", line 18, in exists
os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found
My command line input:
JPF-MBP:~ johnfinnigan$ python /Library/Frameworks/Python.framework/Versions/2.7/bin/topiary \
> --vcf ~/Desktop/TEMP/Results/WES/Tumor_B16_F1_0821/ISMMS/VCF/MuTect/Tumor_B16_F1_0821.mutect.targets.pass.vcf \
> --mhc-predictor netmhcpan \
> --mhc-alleles H2-Kb,H2-Db \
> --mhc-epitope-lengths 8,9,10,11 \
> --ic50-cutoff 500 \
> --rna-transcript-fpkm-gtf-file ~/Desktop/TEMP/Results/RNA/Tumor_B16.F1/Tumor_B16.F1_0821.115B/GTF/StringTie/HISAT2/Tumor_B16.F1_0821.115B.HISAT2.sorted.gtf \
> --rna-min-transcript-expression 0.1 \
> --output-csv ~/Desktop/TEMP/Results/MTA/Tumor_B16.F1_0821/RNA/Tumor_B16.F1_0821_mutect.targets.pass.vcf.netmhcpan.RNA.csv
Any ideas?
Sorry @JPFinnigan, I was passing the wrong filename to the GTF parser. Try again?
Looks good to me % minor questions. @JPFinnigan good find, I wouldn't have caught that error in my review.
@iskandr LGTM
Implements long requested feature (https://github.com/hammerlab/topiary/issues/39) from @JPFinnigan.
load_transcript_fpkm_dict_from_gtf
which generates a transcript ID -> FPKM dictionary from a given GTF file (using gtfparse for most of the actual work).--rna-gene-fpkm-file
becomes--rna-gene-fpkm-tracking-file
. Similarly--rna-transcript-fpkm-file
becomes--rna-transcript-fpkm-tracking-file
.--rna-transcript-fpkm-gtf-file
. There's no gene version of this flag since StringTie seems to only estimate transcript-level FPKMs. If someone using a different tool requests it, we can easily extend the code to handle gene_id's as well.