levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

Extract date stamp of data acquisition? #97

Closed sorenwacker closed 1 year ago

sorenwacker commented 1 year ago

Hi,

I opened this question on bioinformatics.stackexchange and one user suggested

from pyteomics import mzml

In [2]: with mzml.MzML('tests/test.mzML') as f:
   ...:     print(next(f.iterfind('run', recursive=False))['startTimeStamp'])

I wonder if there is a better way to extract the datestamp of data acquisition with pyteomics?

levitsky commented 1 year ago

Hi, that was my answer, so I approve it :)

The XML parsing API is very simple, you have a single class which by default reads spectra (in this case MzML), and a workhorse method, iterfind, which does all the work. When you need to read something that is not part of the spectrum data, you just call it directly with the name of the element that you need.

In this case it will save you a lot of time to pass recursive=False, because the run element actually contains all the spectra within it, so you don't want iterfind to parse all of the spectra and then immediately discard them.

The code above with recursive=False should be fast enough, and I don't think there is a way to do it faster. In terms of the length of code, there are no shortcuts for this, but you are free to use this code to define your own function.

sorenwacker commented 1 year ago

That is funny, I did not notice it was you. Thank you!