pymzml / pymzML

pymzML - an interface between Python and mzML Mass spectrometry Files
https://pymzml.readthedocs.io/en/latest/
MIT License
163 stars 92 forks source link

reintroduce support for <referenceableParamGroup> #92

Closed ftwkoopmans closed 6 years ago

ftwkoopmans commented 6 years ago

My mzML files lack common <cvParam> elements (eg; centroid, mslevel) in the<spectrum> elements, and instead contain references to <referenceableParamGroup>.

There was support for this in older pymzML versions (eg; pymzML-0.7.9 -->> 'referenceableParamGroupList' @ run.py and 'initFromTreeObjectWithRef' @ spec.py).

I assume this usecase is not so common and therefore glanced over in initial v2 release, but I'm hoping this feature can be reimplemented so we can migrate from the old version :)

Example snippets:

    <referenceableParamGroupList count="82">
      <referenceableParamGroup id="_sample1experiment1SpectrumParams">
        <cvParam cvRef="MS" accession="MS:1000579" name="MS1 spectrum" />
        <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1" />
        <cvParam cvRef="MS" accession="MS:1000127" name="centroid spectrum" />
        <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" />
      </referenceableParamGroup>
      <referenceableParamGroup id="_sample1experiment1ScanWindowParams">
        <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" unitAccession="MS:1000040" unitCvRef="MS" unitName="m/z" value="350" />
        <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" unitAccession="MS:1000040" unitCvRef="MS" unitName="m/z" value="1250" />
        <userParam name="m/z step" unitAccession="MS:1000040" unitCvRef="MS" unitName="m/z" value="0.376726663876099" />
      </referenceableParamGroup>
     ...
        <spectrum id="sample=1 period=1 cycle=3 experiment=1" defaultArrayLength="1129" index="82">
          <referenceableParamGroupRef ref="_sample1experiment1SpectrumParams" />
          <cvParam cvRef="MS" accession="MS:1000285" name="total ion current" value="169205" />
          <scanList count="1">
            <cvParam cvRef="MS" accession="MS:1000571" name="sum of spectra" />
            <scan>
              <cvParam cvRef="MS" accession="MS:1000826" name="elution time" unitAccession="UO:0000031" unitCvRef="MS" unitName="minute" value="0.1143" />
              <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" unitAccession="UO:0000031" unitCvRef="MS" unitName="minute" value="0.1143" />
              <scanWindowList count="1">
                <scanWindow>
                  <referenceableParamGroupRef ref="_sample1experiment1ScanWindowParams" />
                </scanWindow>
              </scanWindowList>
            </scan>
          </scanList>
          <binaryDataArrayList count="2">
            <binaryDataArray encodedLength="11044">
              <cvParam cvRef="MS" accession="MS:1000514" name="m/z array" unitAccession="MS:1000040" unitCvRef="MS" unitName="m/z" />
              <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" />
              <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" />
              <binary>...</binary>
            </binaryDataArray>
            <binaryDataArray encodedLength="9532">
              <cvParam cvRef="MS" accession="MS:1000515" name="intensity array" unitAccession="MS:1000131" unitCvRef="MS" unitName="number of counts" />
              <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" />
              <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" />
              <binary>...</binary>
            </binaryDataArray>
          </binaryDataArrayList>
        </spectrum>
MKoesters commented 6 years ago

Hi,

I'll have a look into this and will try to reimplement this feature Thanks for reporting this!

Best, Manuel

MKoesters commented 6 years ago

@ftwkoopmans You may want to have a look at the fix_#92 branch, I had no appropriate file to test, but I mocked one of my test files and tried to reproduce your mzML This is a quick fix and I will have a closer look hopefully this weekend, but at least for my mocked file it does the job

ftwkoopmans commented 6 years ago

@MKoesters thanks for looking into this. I've uploaded an mzML for you to test with @ https://surfdrive.surf.nl/files/index.php/s/cbahqYlPV7DSxvT

I've downloaded the fix_#92 branch, ran python setup.py install and get the following error when parsing the mzML file;

  File "test_mzML.py", line 7, in <module>
    for spec in run:
  File "...\lib\site-packages\pymzml-2.0.3-py3.6.egg\pymzml\run.py", line 145, in __next__
    ms_level = spectrum.ms_level
  File "...\lib\site-packages\pymzml-2.0.3-py3.6.egg\pymzml\spec.py", line 851, in ms_level
    ns=self.ns
AttributeError: 'NoneType' object has no attribute 'get'

my test code;

import pymzml
import sys

if __name__ == '__main__':
    path = sys.argv[1]
    run = pymzml.run.Reader(path)
    for spec in run:
        if spec.ms_level == 2:
            print(spec.ID)
            print(spec.selected_precursors)
            break

ps. when I run the code using a mzML without a <referenceableParamGroup> I get:

Traceback (most recent call last):
  File "test_mzML.py", line 7, in <module>
    for spec in run:
  File "...\lib\site-packages\pymzml-2.0.3-py3.6.egg\pymzml\run.py", line 135, in __next__
    has_ref_group = self.info['referenceable_param_group_list']
KeyError: 'referenceable_param_group_list'
MKoesters commented 6 years ago

Thanks for providing an example file. I tested your code with your example file and our standard example.mzML, it should work now. As soon as I know its fine for you, I'll merge this fix. Thanks again for reporting this and using pymzML

Best, Manuel

ftwkoopmans commented 6 years ago

I've just tested with 2 different files, works like a charm. Thanks 👍

Trapacology commented 4 years ago

Hello, I am new to programming and want to parse my mzml file with pymzml. Can someone write an example script please? Thanks in advance

StSchulze commented 4 years ago

Hi @Trapacology ,

thanks for giving pymzml a try.

Please note that it would be best to open a new issue if you need help with a specific problem, instead of replying on a closed issue.

In the folder "example_scripts", you can find a lot of examples for various applications. In addition, the documentation offers a quick start guide with code examples: https://pymzml.readthedocs.io/en/latest/quick_start.html

Please let us know about your specific problem/task, if none of the above helps with that.