pymzml / pymzML

pymzML - an interface between Python and mzML Mass spectrometry Files
https://pymzml.readthedocs.io/en/latest/
MIT License
163 stars 92 forks source link

Ursgal mzml2mgf conversion fails if mzml contains empty specs #102

Closed StSchulze closed 6 years ago

StSchulze commented 6 years ago

I'm using the mzml2mgf conversion function of Ursgal but I get the following error for some files:

Converting file: mzml : /home/sschulze/analysis/PXD009116/WT_EXP1_botttom20.mzML to mgf : /home/sschulze/analysis/PXD009116/WT_EXP1_botttom20.mgf Traceback (most recent call last):mzML : Processing spectrum 7523 File "do_it_all_folder_wide_single_validation.py", line 250, in target_decoy_database = sys.argv[3], File "do_it_all_folder_wide_single_validation.py", line 131, in main engine='mzml2mgf_2_0_0', File "/home/sschulze/analysis/ursgal/ucontroller.py", line 899, in convert output_file_name = output_file_name File "/home/sschulze/analysis/ursgal/ucontroller.py", line 351, in convert_to_mgf_and_update_rt_lookup force, engine_name, answer File "/home/sschulze/analysis/ursgal/ucontroller.py", line 2186, in run_unode_if_required json_path = json_path, File "/home/sschulze/analysis/ursgal/unode.py", line 1400, in run report['execution'] = self._execute() File "/home/sschulze/analysis/ursgal/wrappers/mzml2mgf_2_0_0.py", line 71, in _execute precursor_max_charge = self.params['translations']['precursor_max_charge'], File "/home/sschulze/analysis/ursgal/resources/platform_independent/arc_independent/mzml2mgf_2_0_0/mzml2mgf_2_0_0.py", line 115, in main peaks_2_write = spec.peaks('centroided') File "/home/sschulze/analysis/pymzml/spec.py", line 1032, in peaks mz_params = self._get_encoding_parameters('m/z array') File "/home/sschulze/analysis/pymzml/spec.py", line 237, in _get_encoding_parameters ns=self.ns AttributeError: 'NoneType' object has no attribute 'encode'

It seems like the corresponding spectrum (7523) is empty.

      <spectrum index="7523" id="controllerType=0 controllerNumber=1 scan=7524" defaultArrayLength="0">
      <cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>
      <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>
      <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" value=""/>
      <cvParam cvRef="MS" accession="MS:1000127" name="centroid spectrum" value=""/>
      <cvParam cvRef="MS" accession="MS:1000504" name="base peak m/z" value="0.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
      <cvParam cvRef="MS" accession="MS:1000505" name="base peak intensity" value="0.0" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
      <cvParam cvRef="MS" accession="MS:1000285" name="total ion current" value="0.0"/>
      <scanList count="1">
        <cvParam cvRef="MS" accession="MS:1000795" name="no combination" value=""/>
        <scan instrumentConfigurationRef="IC2">
          <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="29.54705" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
          <cvParam cvRef="MS" accession="MS:1000512" name="filter string" value="ITMS + c NSI r d Full ms2 1958.86@cid35.00 [525.00-2000.00]"/>
          <cvParam cvRef="MS" accession="MS:1000616" name="preset scan configuration" value="10"/>
          <cvParam cvRef="MS" accession="MS:1000927" name="ion injection time" value="15.612242698669" unitCvRef="UO" unitAccession="UO:0000028" unitName="millisecond"/>
          <userParam name="[Thermo Trailer Extra]Monoisotopic M/Z:" value="1958.8634033203125" type="xsd:float"/>
          <scanWindowList count="1">
            <scanWindow>
              <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="525.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="2000.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
            </scanWindow>
          </scanWindowList>
        </scan>
      </scanList>
      <precursorList count="1">
        <precursor spectrumRef="controllerType=0 controllerNumber=1 scan=7522">
          <isolationWindow>
            <cvParam cvRef="MS" accession="MS:1000827" name="isolation window target m/z" value="1958.863403320313" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
            <cvParam cvRef="MS" accession="MS:1000828" name="isolation window lower offset" value="1.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
            <cvParam cvRef="MS" accession="MS:1000829" name="isolation window upper offset" value="1.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          </isolationWindow>
          <selectedIonList count="1">
            <selectedIon>
              <cvParam cvRef="MS" accession="MS:1000744" name="selected ion m/z" value="1958.863403320313" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              <cvParam cvRef="MS" accession="MS:1000041" name="charge state" value="5"/>
            </selectedIon>
          </selectedIonList>
          <activation>
            <cvParam cvRef="MS" accession="MS:1000133" name="collision-induced dissociation" value=""/>
            <cvParam cvRef="MS" accession="MS:1000045" name="collision energy" value="35.0" unitCvRef="UO" unitAccession="UO:0000266" unitName="electronvolt"/>
          </activation>
        </precursor>
      </precursorList>
      <binaryDataArrayList count="2">
        <binaryDataArray encodedLength="0">
          <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
          <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
          <cvParam cvRef="MS" accession="MS:1000514" name="m/z array" value="" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <binary></binary>
        </binaryDataArray>
        <binaryDataArray encodedLength="0">
          <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
          <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
          <cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value="" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
          <binary></binary>
        </binaryDataArray>
      </binaryDataArrayList>
    </spectrum>

Probably, empty specs could be removed already during conversion from raw to mzml (I have converted them with ProteoWizard's msconvert) but it would still be a bit more convenient, if pymzml doesn't fail for these specs but just skips them or returns an empty peak list.

fu commented 6 years ago

Downloading RAW now, then I'll have a look

Cheers

.c

MKoesters commented 6 years ago

Should be fixed with #105

StSchulze commented 6 years ago

Works for me :) --> can be closed (after merging the pull request, I guess)