mobiusklein / ms_deisotope

A library for deisotoping and charge state deconvolution of complex mass spectra
https://mobiusklein.github.io/ms_deisotope
Apache License 2.0
33 stars 14 forks source link

Importing mzML file with only ms1 data #14

Closed realperson999 closed 4 years ago

realperson999 commented 5 years ago

Hello,

I wanted to experiment with this deconvolution package. I'm looking at some ms1 level data and I'm having trouble importing it.

Using MSFileLoaderI'm able to import a file with ms2 results just fine, giving ms_deisotope.data_source.scan.scan.Scanobjects. Using the same workflow on an mzML file with only ms1, or ms1 and ms2 data seems to derive a different class whenever there's an ms1 scan, ms_deisotope.data_source.scan.base.ScanBunch, and I'm not seeing any output data resulting from this, just an empty ScanBunch.

ScanBunch(
precursor=
Scan('controllerType=0 controllerNumber=1 scan=1', index=0, time=0.0022, ms_level=1),
products=
[])

I can import the same mzML files that include the ms1 scans using mzml.mzml.reador mzml.mzml.MzMLwith the desired output. But those output dicts and not Scanobjects, which don't seem to work with the deconvolution bits.

Is there a better way to do this?

Edit: I had to re-write some of this I made minor mistakes in the original message.

mobiusklein commented 5 years ago

Hello,

You're running into a failing in my ability to document iteration strategies: https://mobiusklein.github.io/ms_deisotope/docs/_build/html/data_source/common_reader.html#iteratation-strategies.

If you have an mzML file which contains only MSn scans, the default iteration strategy that gets used is to produce single Scan objects, what I call "single" mode. If MS1 scans are present, the default iteration strategy is to produce ScanBunch objects, which have a precursor attribute, the MS1 Scan object, and a products attribute, a list of 0 or more Scan objects with an ms_level > 1 derived from precursor, what I call "grouped" mode. If an MS1 scan does not have any MSn scans, the resulting bunch's products list will be empty. If you iterate all the way through your file, do you ever see MSn scans in the products list of the produced bunches?

You have to unpack ScanBunch objects to reach the individual scans yourself.

If you want to work with single Scan objects regardless of file content, you can call reader.make_iterator(grouped=False) where reader is the object returned by MSFileLoader, but this will interleave MS1 and MSn scans without differentiation, so you'll need to check scan.ms_level to tell them apart.

ms_deisotope.data_source.mzml implements the mzML parsing machinery, which in turn imports pyteomics.mzml, which is the parser you were manipulating that just produced bare dict objects. ms_deisotope wraps those dict objects in Scans bound to their reader which know how to get specific properties by name and plugs into all of the signal processing algorithms in ms_deisotope and ms_peak_picker.

realperson999 commented 5 years ago

Thanks, this is very helpful. Good to know how the blank ScanBunch wasn't actually blank, I didn't dig deep enough.

Cheers!

mobiusklein commented 5 years ago

Happy to help.

By the way, as you're working with MSn spectra with MS1 spectra available, if you're interested in precursor peak recalculation to correct the precursor to be the monoisotopic peak, scan.precursor_information.correct_mz and scan.precursor_information.find_monoisotopic_peak are able to do this too. These operations are done automatically by the deconvolution pipeline ms_deisotope.ScanProcessor.

mobiusklein commented 5 years ago

Just checking in to see if this issue was resolved and can be closed. Please let me know if you had any other questions or if we were able to solve your problem.

realperson999 commented 4 years ago

Yes! This is all solved. Sorry, I didn't make my previous comment concise enough. Thanks for your help.

mobiusklein commented 4 years ago

Thank you for the clarification. I'll close this issue then.