levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

Deisotoping Methodology - future pyteomics implementation? #127

Closed CCranney closed 9 months ago

CCranney commented 9 months ago

Hi,

I'm researching ways to deisotope mzXML files from mass spec runs via python (or similarly accessible, free, simple, and preferably open-source methods). One way to do so would be through pyopenms, but I am finding their process for doing so results in a loss of detail (floats are rounded to fewer decimal places, additional metadata is lost in the translation). There are other avenues I am pursuing, but I thought I'd look into whether or not deisotoping exists in pyteomics (as I use it in my programs already). I found the following code in the pyteomics/pyteomics/_schema_defaults.py file:

_mzxml_schema_defaults = {'bools': {('dataProcessing', 'centroided'),
                                 ('dataProcessing', 'chargeDeconvoluted'),
                                 ('dataProcessing', 'deisotoped'),

This leads me to wonder if this was a possible future implementation, or that it may already exist (I have not found it if so). Would this be something that could be implemented in the future?

levitsky commented 9 months ago

Hi @CCranney, the part of the code you found is related to parsing of metadata stored in XML files related to prior processing. In this case, if the mzXML file has information indicating that the stored spectra have been deisotoped, this bit is responsible to correct representation of that information in the parsed output of Pyteomics. Unfortunately, it is not related to any current or future implementation of deisotoping in Pyteomics, nor am I aware of any plans to add it. Suggestions and contributions are always welcome, although data processing has not been the focus of Pyteomics as much as just plain parsing. If you are looking for an existing solution, I would probably look to other packages, e.g.: https://github.com/mobiusklein/ms_deisotope

CCranney commented 9 months ago

I appreciate the explanation and recommendation, thank you @levitsky!