lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

No IMS model ran mzML input #117

Closed patrick-willems closed 3 months ago

patrick-willems commented 5 months ago

Hey,

I was eager to test the new ims model on some of our data. I use MS2-only mzML files that contain ion mobility fields as:

<cvParam cvRef="MS" accession="MS:1002815" name="inverse reduced ion mobility" value="1.1835660934448242"/>

After succesfully running sage, the PIN and TSV outputs the ion_mobility, predicted_mobility and sqrt(delta_mobility) columns all default to zero for all PSMs. Do we need to specify another JSON parameter or the info might not be correctly parsed from the mzML?

Thanks!

lazear commented 5 months ago

Can you share which converter you used, and the mzML for the full spectrum (or the whole file)? I am going to guess that this has to do with IMS being written to different parts of the XML document by different converters.

patrick-willems commented 5 months ago

These files were converted by MSFragger. The gzipped mzML can be downloaded via: https://filesender.belnet.be/?s=download&token=0f657e1d-ea0c-4c75-8b89-227cf2d1394c

Random example spectrum:

<spectrum index="18851" id="controllerType=0 controllerNumber=1 scan=18874" defaultArrayLength="207" dataProcessingRef="MSFragger">
          <cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>
          <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>
          <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" value=""/>
          <cvParam cvRef="MS" accession="MS:1000127" name="centroid spectrum" value=""/>
          <cvParam cvRef="MS" accession="MS:1000504" name="base peak m/z" value="720.385009765625" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <cvParam cvRef="MS" accession="MS:1000505" name="base peak intensity" value="3636.0" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
          <cvParam cvRef="MS" accession="MS:1000285" name="total ion current" value="16334.0"/>
          <cvParam cvRef="MS" accession="MS:1000528" name="lowest observed m/z" value="199.143798828125" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <cvParam cvRef="MS" accession="MS:1000527" name="highest observed m/z" value="1539.11572265625" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
          <cvParam cvRef="MS" accession="MS:1000796" name="spectrum title" value="T05451_AurEl4_PM2_CMB-1482-MHCI_new_GB3_1_6084.18874.18874.1"/>
          <userParam name="uncalibrated precursor mz" value="720.37695"/>
          <scanList count="1">
            <scan>
              <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="19.516155242919922" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
              <cvParam cvRef="MS" accession="MS:1002815" name="inverse reduced ion mobility" value="1.2539584636688232"/>
              <cvParam cvRef="MS" accession="MS:1000512" name="filter string" value=""/>
              <scanWindowList count="1">
                <scanWindow>
                  <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="199.143798828125" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="1539.11572265625" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                </scanWindow>
              </scanWindowList>
            </scan>
          </scanList>
          <precursorList count="1">
            <precursor spectrumRef="controllerType=0 controllerNumber=1 scan=10898">
              <isolationWindow>
                <cvParam cvRef="MS" accession="MS:1000827" name="isolation window target m/z" value="720.657615" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <cvParam cvRef="MS" accession="MS:1000828" name="isolation window lower offset" value="1.1076272070312143" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <cvParam cvRef="MS" accession="MS:1000829" name="isolation window upper offset" value="1.1023947656250357" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                <userParam name="ms level" value="1"/>
              </isolationWindow>
              <selectedIonList count="1">
                <selectedIon>
                  <cvParam cvRef="MS" accession="MS:1000744" name="selected ion m/z" value="720.3768122581744" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000041" name="charge state" value="1"/>
                  <cvParam cvRef="MS" accession="MS:1000042" name="peak intensity" value="5823.0" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
                </selectedIon>
              </selectedIonList>
              <activation>
                <cvParam cvRef="MS" accession="MS:1000044" name="UNKNOWN" value=""/>
              </activation>
            </precursor>
          </precursorList>
          <binaryDataArrayList count="2">
            <binaryDataArray arrayLength="207" encodedLength="1064">
              <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
              <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
              <cvParam cvRef="MS" accession="MS:1000514" name="m/z array" value="" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              <binary>eJwt0VlIVGEYBuA/jRYrxCVopdHpwsAodQgLgj9NabFSR4mM8KgtbpCjWTlWHk8WpCk4tloXpzEIxZtsU7M4jJaGRKJJJUTHyS6MIhq7yRZq3terh+973/87IwohRPNEvvyP/sVa4Ffp8RT5FTGHijGviizxa5bHQOO6G4q9HZzPD1MtttSvOlgKlXIP1OvGocwILPNr+eqAcokG1XluKLK80PQEHMO91RpUVvqgHIorR+9xIl3YS3Osx5E7qNgfB9WGXM6jjcyfuqBS6aYpLdB46YVmV+4J9LwaNAaGoeqa4j457CTuOdOhfGaHuiePdrqh2t/BXv4ItAROQLPIx96cKeadoRW402+v4O91QbN1GOohVid6mRlQSToG1So31INvMy/vdfL3+ThvTqzEnZ9UHrZD9UYj994mKHbd515OQH2r7RS+czkdmsFlnEuoWO+GMu3uKf4fe6H+KfQ0enfDoTlggyI8Ccp5dmiJqWHP0cTeN2ocuAf1TA/zno9898Z2BvkCOxT2DKhEH4RyRQ1Uf3pocR/3/T/Ym0yowndnN0E1e5jzntdQxo9C0z4BjTk/oB5FleT5Kr47EkIXh9I3EdBYGgv1bTYoXRuZ92yja7Og+e4k80INqt5W5tEv+H6IiifjUEkJqsY+fhlU6mKhjNsN1bf7oNmSxfn5BfYia6HhmLGnmb33t9jLGYOWh+OcG7zsRfs4VwZp6P/ZTlcUQMvhQijHHVC5VEp/OZnv1DgfaYR6ahtUF3VDI4Va1n2AYnCS9x2f2X9FzZBpvrtC5fdZZ/G+OhLK51ugCE6kG7KgElLI3upiaOl1cm9tg3p6N/OxSWimTHPfFVCDftQCqBZQY9YmzjsSmBdoUPylyplaKMs6aMQI1F1B5zC3R0JlgKphVmjWxHKOyuP8uwLqg24o1rTw/fRn7ke+c7/Rx3tVMyZNM2+k4uaMfVRmpp7H3xGVB2X3Uah+qKD1TqhMUUtyJRRp1Eirhma7xn74Wd65c469+gvMl9ZyX3WR+w1XeTdb5x2bm725D5gvf8R7xxdek/8AdMO7dw==</binary>
            </binaryDataArray>
            <binaryDataArray arrayLength="207" encodedLength="572">
              <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
              <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
              <cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value="" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
              <binary>eJx1lEtOQzEMRbOUjhBCCFGQSqFAQwuFln/5iL9ASKh8RsACsoqOuwAGLOEtpUOWgeCdaylGvMmRnbw4vrEdws+3EX8ROnANNp2/AdslE/60hX8CLsF1OM++LvYK3CtZHGPvYvexD+Fcfn5x4/bzf7wizoD1KnbEXmWfzq2x/ordg0fwgvUWcU5y2/KFCd3iKbby1/4dWGed/NM+fvQ1PaV3Nz/P4k9D9Ex6xyk4WfLzLd9XkH9Cv6B46GP1IN0azr8MF+EZlA46F0r/eJDHK7CtLnRvp6vdQ/fv5/+N7lgf5P+lfv6f6Vx38fDr3eI1/pa7h/brHWvOjy7KN23il37KA930XlafvFeYhapH6e70qDy5+ND02cbfzKk6tfp7zs9PxP3T19J5AfJ+Uf3Sc/vVZy1YdUS/qP4n/9G7O0/zQvmofvWf9COu6kx5Bl9fjppjYQZKL+IlvXcn91vfqA+kn+bRJbZ01jmKr35nniXXLzZ/iBOxw3nJ8UvOyiPrqiPfR8ztwP1Un+m+5O1Dvv9rWHLwgV/zh/4Y85/Nof/6V/dpm/8b22aSaw==</binary>
            </binaryDataArray>
          </binaryDataArrayList>
        </spectrum>
jspaezp commented 5 months ago

ok ... the issue is that I was expecting the 1/k0 info to be inside the selected ion info

              <selectedIonList count="1">
                <selectedIon>
                  <cvParam cvRef="MS" accession="MS:1000744" name="selected ion m/z" value="720.3768122581744" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000041" name="charge state" value="1"/>
                  <cvParam cvRef="MS" accession="MS:1000042" name="peak intensity" value="5823.0" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
                </selectedIon>
              </selectedIonList>

whilst in your data it is located in the scan info

          <scanList count="1">
            <scan>
              <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="19.516155242919922" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
              <cvParam cvRef="MS" accession="MS:1002815" name="inverse reduced ion mobility" value="1.2539584636688232"/>
              <cvParam cvRef="MS" accession="MS:1000512" name="filter string" value=""/>
              <scanWindowList count="1">
                <scanWindow>
                  <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="199.143798828125" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="1539.11572265625" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                </scanWindow>
              </scanWindowList>
            </scan>
          </scanList>

... I think according to the current mzML specification both are viable annotations, it is both a:

  1. ion mobility attribute -> scan attribute
  2. ion selection attribute.

https://github.com/HUPO-PSI/psi-ms-CV/blob/master/psi-ms.obo line 18481

The solution would be to copy the content here: https://github.com/jspaezp/sage/blob/a7414c8bcf608c5b5ff5271c08d1f5d87d72c23b/crates/sage-cloudpath/src/mzml.rs#L241-L243

to this line as well ...

https://github.com/jspaezp/sage/blob/a7414c8bcf608c5b5ff5271c08d1f5d87d72c23b/crates/sage-cloudpath/src/mzml.rs#L263

IN THE MEANTIME you can pass the .d directly! (if you have one ...)

jspaezp commented 5 months ago

https://github.com/lazear/sage/pull/119

(possible) fix there

patrick-willems commented 4 months ago

Thanks for the help! I can work with this.