Closed mwang87 closed 2 years ago
Hello,
It's likely that this is one of those features that I failed to document in Sphinx (or perhaps at all). Whenever pyteomics
parses a cvParam
with a unit, instead of converting the value
of the param into a plain primitive like float
or str
, it will be converted into a unitfloat
or unitstr
which has an extra attribute unit_info
. The unit_info
will contain the unit name, or its accession code if the name is omitted.
I wanted to include the name of the unit in the repr
, but that broke some libraries so it is only in the _repr_pretty_
hook used by IPython.
Given the XML:
<scanList count="1">
<cvParam cvRef="MS" accession="MS:1000795" name="no combination" value=""/>
<scan>
<cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="0.004935" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
<cvParam cvRef="MS" accession="MS:1000512" name="filter string" value="FTMS + p ESI Full ms [200.00-2000.00]"/>
<cvParam cvRef="MS" accession="MS:1000616" name="preset scan configuration" value="1"/>
<cvParam cvRef="MS" accession="MS:1000927" name="ion injection time" value="68.227485656738" unitCvRef="UO" unitAccession="UO:0000028" unitName="millisecond"/>
<scanWindowList count="1">
<scanWindow>
<cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="200.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
<cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="2000.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
</scanWindow>
</scanWindowList>
</scan>
</scanList>
You get the following dict
:
{'count': 1,
'scan': [{'scanWindowList': {'count': 1,
'scanWindow': [{'scan window lower limit': 200.0 m/z,
'scan window upper limit': 2000.0 m/z}]},
'scan start time': 0.004935 minute,
'filter string': 'FTMS + p ESI Full ms [200.00-2000.00]',
'preset scan configuration': 1.0,
'ion injection time': 68.227485656738 millisecond}],
'no combination': ''}
To access the unit, you might write something like this:
>>> scan['scanList']['scan'][0]['scan start time'].unit_info
'minute'
Awesome, will give it a try! Was doing a double parse with pymzml and it was not a good time.
@mwang87 I added documentation to describe how units are handled at https://pyteomics.readthedocs.io/en/latest/data.html#unit-handling. Does this sufficiently describe them for your purposes?
@mobiusklein This is great. This would have cleared it up the first time around (no worries my own documentation is not great!).
But overall, it worked like a charm. Thanks so much for being awesome!
Best,
Ming
It is not clear how to distinguish seconds vs minutes in the retention time for scans in mzML files. The unit is specified in the mzML
Here the unitName is "seconds"
However, iterating through the spectra with the mzml reader, an example data structure:
This includes the scan start time, but not the unit.