Closed anoronh4 closed 4 years ago
Tried renaming "Splice_Region" to "Splice_Site" in the maf and re-ran the script, but still returns the same error - No columns to parse from file
/juno/work/ccs/ravichav/Hellman_Exomes/IlluminaExome_38MB/TempoMegatron/containers/metadataparser/create_metadata_file.py --sampleID SU2LC_MSK_1089_T__SU2LC_MSK_1089_N --tumorID SU2LC_MSK_1089_T --normalID SU2LC_MSK_1089_N --facetsPurity_out ../../../TempoMegatron/results/somatic/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N/facets/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N/facets0.5.14c100pc500/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N_purity.out --facetsQC /juno/work/ccs/ravichav/Hellman_Exomes/IlluminaExome_38MB/TempoMegatron/results/somatic/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N/facets/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N/facets0.5.14c100pc500/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N.qc.txt --MSIsensor_output /juno/work/ccs/ravichav/Hellman_Exomes/IlluminaExome_38MB/work/43/9a63b7a3b82cfa11efe0d5d67bcd1a/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N.msisensor.tsv --mutational_signatures_output /juno/work/ccs/ravichav/Hellman_Exomes/IlluminaExome_38MB/work/4f/9538da3636c2d72918ce611177ec2d/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N.mutsig.txt --polysolver_output /juno/work/ccs/ravichav/Hellman_Exomes/IlluminaExome_38MB/work/f6/356aee770f90a453aba46889d9c9c4/SU2LC_MSK_1089_N.hla.txt --MAF_input /juno/work/ccs/ravichav/Hellman_Exomes/IlluminaExome_38MB/TempoMegatron/results/somatic/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N/combined_mutations/SU2LC_MSK_1089_T__SU2LC_MSK_1089_N.somatic.final.maf --coding_baits_BED /juno/work/taylorlab/cmopipeline/mskcc-igenomes/grch37/coding_regions/AgilentExon_51MB_b37_v3_baits.coding.sorted.merged.bed
Traceback (most recent call last): File "/juno/work/ccs/ravichav/Hellman_Exomes/IlluminaExome_38MB/TempoMegatron/containers/metadataparser/create_metadata_file.py", line 191, in <module> resultdf = pd.read_csv(resulting_intersection.fn, sep="\t", header=None) File "/work/offit/Programz/AnacondaForPy27/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "/work/offit/Programz/AnacondaForPy27/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/work/offit/Programz/AnacondaForPy27/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__ self._make_engine(self.engine) File "/work/offit/Programz/AnacondaForPy27/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/work/offit/Programz/AnacondaForPy27/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__ pandas.errors.EmptyDataError: No columns to parse from file
As a secondary check, tried bedtools intersect between the agilent file and the maf - there are two matching records as output, which are SILENT and INTRON
@vigneshravi this is likely due to the fact that even with the additional variant, the maf still did not intersect with any coding regions, and the python script was still unable to parse the empty result, which is the core issue behind your error. i will make a new issue for that problem.
The
create_metadata_file.py
script inMetaDataParser
process uses the number of records in the maf file to report tumor mutational burden but currently filters outSplice_Region
, probably designed as such because it is not in official MAF specs. Instead it acceptsSplice_Site
which is a standard value and to my understanding refers to the same thing.This line filters the maf file https://github.com/mskcc/tempo/blob/master/containers/metadataparser/create_metadata_file.py#L283 one solution is to change
vcf2maf
package (sinceSplice_Site
is considered non-standard anyways), or just do a replace ofSplice_Region
->Splice_Site
, or filter in the value for calculation of TMB.