lazear / sage

Proteomics search & quantification so fast that it feels like magic
MIT License
201 stars 38 forks source link

No IMS model ran mzML input #117

Closed patrick-willems closed 3 months ago

patrick-willems commented 5 months ago


I was eager to test the new ims model on some of our data. I use MS2-only mzML files that contain ion mobility fields as:

<cvParam cvRef="MS" accession="MS:1002815" name="inverse reduced ion mobility" value="1.1835660934448242"/>

After succesfully running sage, the PIN and TSV outputs the ion_mobility, predicted_mobility and sqrt(delta_mobility) columns all default to zero for all PSMs. Do we need to specify another JSON parameter or the info might not be correctly parsed from the mzML?


lazear commented 5 months ago

Can you share which converter you used, and the mzML for the full spectrum (or the whole file)? I am going to guess that this has to do with IMS being written to different parts of the XML document by different converters.

patrick-willems commented 5 months ago

These files were converted by MSFragger. The gzipped mzML can be downloaded via:

Random example spectrum:

<spectrum index="18851" id="controllerType=0 controllerNumber=1 scan=18874" defaultArrayLength="207" dataProcessingRef="MSFragger">
          <cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>
          <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>
          <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" value=""/>
          <cvParam cvRef="MS" accession="MS:1000127" name="centroid spectrum" value=""/>
          <cvParam cvRef="MS" accession="MS:1000504" name="base peak m/z" value="720.385009765625" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <cvParam cvRef="MS" accession="MS:1000505" name="base peak intensity" value="3636.0" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
          <cvParam cvRef="MS" accession="MS:1000285" name="total ion current" value="16334.0"/>
          <cvParam cvRef="MS" accession="MS:1000528" name="lowest observed m/z" value="199.143798828125" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <cvParam cvRef="MS" accession="MS:1000527" name="highest observed m/z" value="1539.11572265625" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <cvParam cvRef="MS" accession="MS:1000796" name="spectrum title" value="T05451_AurEl4_PM2_CMB-1482-MHCI_new_GB3_1_6084.18874.18874.1"/>
          <userParam name="uncalibrated precursor mz" value="720.37695"/>
          <scanList count="1">
              <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="19.516155242919922" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
              <cvParam cvRef="MS" accession="MS:1002815" name="inverse reduced ion mobility" value="1.2539584636688232"/>
              <cvParam cvRef="MS" accession="MS:1000512" name="filter string" value=""/>
              <scanWindowList count="1">
                  <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="199.143798828125" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="1539.11572265625" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <precursorList count="1">
            <precursor spectrumRef="controllerType=0 controllerNumber=1 scan=10898">
                <cvParam cvRef="MS" accession="MS:1000827" name="isolation window target m/z" value="720.657615" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <cvParam cvRef="MS" accession="MS:1000828" name="isolation window lower offset" value="1.1076272070312143" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <cvParam cvRef="MS" accession="MS:1000829" name="isolation window upper offset" value="1.1023947656250357" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <userParam name="ms level" value="1"/>
              <selectedIonList count="1">
                  <cvParam cvRef="MS" accession="MS:1000744" name="selected ion m/z" value="720.3768122581744" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000041" name="charge state" value="1"/>
                  <cvParam cvRef="MS" accession="MS:1000042" name="peak intensity" value="5823.0" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
                <cvParam cvRef="MS" accession="MS:1000044" name="UNKNOWN" value=""/>
          <binaryDataArrayList count="2">
            <binaryDataArray arrayLength="207" encodedLength="1064">
              <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
              <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
              <cvParam cvRef="MS" accession="MS:1000514" name="m/z array" value="" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
            <binaryDataArray arrayLength="207" encodedLength="572">
              <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
              <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
              <cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value="" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
jspaezp commented 5 months ago

ok ... the issue is that I was expecting the 1/k0 info to be inside the selected ion info

              <selectedIonList count="1">
                  <cvParam cvRef="MS" accession="MS:1000744" name="selected ion m/z" value="720.3768122581744" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000041" name="charge state" value="1"/>
                  <cvParam cvRef="MS" accession="MS:1000042" name="peak intensity" value="5823.0" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>

whilst in your data it is located in the scan info

          <scanList count="1">
              <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="19.516155242919922" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
              <cvParam cvRef="MS" accession="MS:1002815" name="inverse reduced ion mobility" value="1.2539584636688232"/>
              <cvParam cvRef="MS" accession="MS:1000512" name="filter string" value=""/>
              <scanWindowList count="1">
                  <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="199.143798828125" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="1539.11572265625" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>

... I think according to the current mzML specification both are viable annotations, it is both a:

  1. ion mobility attribute -> scan attribute
  2. ion selection attribute. line 18481

The solution would be to copy the content here:

to this line as well ...

IN THE MEANTIME you can pass the .d directly! (if you have one ...)

jspaezp commented 5 months ago

(possible) fix there

patrick-willems commented 4 months ago

Thanks for the help! I can work with this.