levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

Support for Ion Mobility data from Bruker devices? #151

Open LoponteHF opened 1 month ago

LoponteHF commented 1 month ago

Good day.

Is there any intention or estimated time for implementation of access to Ion Mobility data from Bruker devices?

I use Pyteomics regularly and now I'm starting to face some TIMS TOF data that I can't open on it without collapsing the Ion Mobility dimension when converting to mzxml/mzml using MSConvert, but I'd like to explore that data in a foreseeable future and am looking for the possibilities.

I know there are other packages available from other sources, but I'd prefer to keep a single package to access all sorts of data in mzxml/mzml, and Pyteomics has been really helpful so far.

Thanks in advance.

levitsky commented 1 month ago

Hi @LoponteHF,

Could you specify what kind of data you mean? If you're having trouble reading mzML/mzXML files with Pyteomics, then how exactly do you obtain them? If you mean reading the native Bruker format, then I'm afraid I don't have an estimate on that.

LoponteHF commented 1 month ago

I mean bruker TimsTOF data (with Ion Mobility, since you can also turn it off while acquiring data, but then the MZML/MZXML works normally) converted to MZML/MZXML (doesn't matter if MSConvert or Bruker Compass Data Analysis is used in the conversion)... I can't read it with Pyteomics, unless I collapse the ion mobility dimension using MSConvert... Perhaps I can provide you a sample file, if you wish.

levitsky commented 1 month ago

Can you provide a sample file, or perhaps an instruction how to create one (or both)? I heard about issues when converting data with ion mobility, but I have not seen any issues reading mzML files with Pyteomics.

mobiusklein commented 1 month ago

Two issues with reading large timsTOF files converted to mzML is that indexing them can take a very long time when initially opening them, and that the index itself may consume a lot of memory.

The first one could be solved using PreIndexedMzML, which we may want to make the default since the bug in ProteoWizard that corrupted byte offset indices has been fixed for a long time now. The latter, I don't have a good solution for.

LoponteHF commented 1 month ago

Two issues with reading large timsTOF files converted to mzML is that indexing them can take a very long time when initially opening them, and that the index itself may consume a lot of memory.

The first one could be solved using PreIndexedMzML, which we may want to make the default since the bug in ProteoWizard that corrupted byte offset indices has been fixed for a long time now. The latter, I don't have a good solution for.

That might be the problem I'm facing. I am currently converting the files and I'm gonna do some tests with them and then I will get back to you, but it seems like the file sizes will be too big, so I'm looking into trimming and cutting some threshold in order to make it smaller, and then I can also provide them to you, if necessary.

Thank you for your attention and I will get back to you soon.