pymzml / pymzML

pymzML - an interface between Python and mzML Mass spectrometry Files
https://pymzml.readthedocs.io/en/latest/
MIT License
162 stars 92 forks source link

Functionality broken in upgrade from #227

Closed hmayes closed 4 years ago

hmayes commented 4 years ago

Describe the bug Following: https://pymzml.readthedocs.io/en/latest/quick_start.html I started with: run = pymzml.run.Reader(mzml_file) for n, spec in enumerate(run): This works fine in pymzml 2.4.5, but breaks in 2.4.6 To Reproduce See above

Expected behavior Iteration as described

Screenshots Instead: I get the error (from that iteration line): File "/Users/hmayes/miniconda3/envs/lignin_kmc/lib/python3.6/site-packages/pymzml/run.py", line 154, in next spectrum.measured_precision = self.ms_precisions[ms_level]

Desktop (please complete the following information):

StSchulze commented 4 years ago

Hi Heather,

a couple of questions so that we can get a better idea of what's going on:

  1. Are you using the dev branch or the master branch (from Jan 22nd)? In the dev, we included the option to restart the iteration after reading through the file once, but in the master the iteration behavior hasn't changed recently as far as I know.

  2. I assume you are iterating over your file only once, i.e. the error occurs at the first iteration? At the first spec or somewhere in the middle/at the end?

  3. Could you include the full error message so that we can see what the actual error is?

  4. Would it be possible for you to upload the file so that we could try debugging it? Because your code snippet certainly works on our test files.

hmayes commented 4 years ago

Hello! I just noticed that you wrote back. Thank you! 1) I installed both pymzml 2.4.5 and 2.4.6 with pip; I assume the versions on pypi are from the master branches https://pypi.org/project/pymzml/2.4.6/ release date Jan 24, 2020 https://pypi.org/project/pymzml/2.4.5/ release date Sep 11, 2019

2) Yes, I iterate only once (thankfully--they are big files)--see answer to 4 also. The error is thrown at the first spec. That is, as soon as it hits the line "for n, spec in enumerate(ms_run_data):" without ever reaching the line below it.

  1. When I run the attached python script with 2.4.6, I get: Traceback (most recent call last): File "test_pymzml.py", line 60, in <module> process_mzml_input("smaller_31HCD40_ESI+.mzML") File "test_pymzml.py", line 34, in process_mzml_input for n, spec in enumerate(ms_run_data): File "/Users/hmayes/miniconda3/lib/python3.7/site-packages/pymzml/run.py", line 154, in __next__ spectrum.measured_precision = self.ms_precisions[ms_level] KeyError: 0 When running my tests with pytest, it helpfully also shows the function in pymzmL where the error is found: ` self = <pymzml.run.Reader object at 0x1a178d53c8>

    def next(self): """ Iterator for the class :py:class:Run.

    Iterates all of the spectra in the file.
    
    Returns:
        Spectrum (:py:class:`Spectrum`): a spectrum object with interface
            to the original spectrum element.
    
    Example:
    
    >>> for spectrum in Reader:
    ...     print(spectrum.mz, end='\\r')
    
    """
    has_ref_group = self.info.get("referenceable_param_group_list", False)
    while True:
        event, element = next(self.iter, ("END", "END"))
        if event == "end":
            if element.tag.endswith("}spectrum"):
                spectrum = spec.Spectrum(element)
                if has_ref_group:
                    spectrum._set_params_from_reference_group(
                        self.info["referenceable_param_group_list_element"]
                    )
                ms_level = spectrum.ms_level
                  spectrum.measured_precision = self.ms_precisions[ms_level]

    E KeyError: 0

../../../../miniconda3/lib/python3.7/site-packages/pymzml/run.py:154: KeyError ` Note that as soon as I rolled back to pymzML 2.4.5, there is no longer any error.

  1. Of course. temp_test_pyzml.zip I included one of my test mzML files in case there is something different about it than your test files.

Thanks! -Heather

StSchulze commented 4 years ago

Hi Heather, thanks for getting back to us with the details, that definitely helps a lot.

So the problem arises because some of the spectra in your mzML have an ms_level 0, which so far isn't defined in our ms_precision dictionary. I made a pull request to include it and handle it the same way as specs without the ms level tag (#228). Could you check that out and see if that works for you? Your test script passes on the test file that you gave (thanks for that, made it really straight forward to fix).

Besides that, out of curiosity, where does the "electromagnetic radiation spectrum" come from, i.e. what kind of instrument and raw file converter are you using?

Best, Stefan

PS: yes, installing through pypi uses the master branch

hmayes commented 4 years ago

Huzzah!!! That fixes it! Thank you very much for your responses and especially the fix.

As for the details on the MS: I am not an analytical chemist--I'm a computational chemical engineer enlisted to help some analytical chemists more efficiently crunch their data (and using existing packages like pymzML when possible; thank you for publicly sharing it!), and did not run or convert the files myself. The analytical chemist used a Waters Acquity UPLC system equipped with a Phenomenex Kenetex® 1.7µm EVO C18 column (100 x 2.1 mm); I do not know how what program he used to covert the data, but I can certainly ask. As you may have guessed, I greatly trimmed the file I gave you to make for faster testing (instead of the ~1 GB files I often get). The MS 0 /electromagnetic radiation spectrum data is absorbance/UV detection at 210 nm, which I do not use is my analysis, but "continue" to the next spectrum, keeping only the MS1 and MS2 data.

Thank you again. Please let me know if you want me to follow up on the converter type, MS0, etc.

My best, -Heather

StSchulze commented 4 years ago

Great, I'm glad it works. It will be implemented in the master in the next release for pypi in a few weeks.

The info that it's UV detection already helps, thanks.