levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

Properly detect non-standard array name when it is a term value #48

Closed mobiusklein closed 3 years ago

mobiusklein commented 3 years ago

Right now, we can detect a mzML binary data array non-standard data array's name if it is a userParam with the word "array" in its name.

<binaryDataArray encodedLength="20">
  <cvParam cvRef="PSI-MS" accession="MS:1000786" name="non-standard data array"/>
  <userParam name="frobnication wavelength array"/>
  <cvParam cvRef="PSI-MS" accession="MS:1000574" name="zlib compression" value=""/>
  <cvParam cvRef="PSI-MS" accession="MS:1000521" name="32-bit float" value=""/>
  <binary>eJxjYGiwZxhBGADvOCzF</binary>
</binaryDataArray>

It might also be the value of the "non-standard data array" cvParam according to the spec and CV:

<binaryDataArray encodedLength="20">
  <cvParam cvRef="PSI-MS" accession="MS:1000786" name="non-standard data array" value="frobnication wavelength array"/>
  <cvParam cvRef="PSI-MS" accession="MS:1000574" name="zlib compression" value=""/>
  <cvParam cvRef="PSI-MS" accession="MS:1000521" name="32-bit float" value=""/>
  <binary>eJxjYGiwZxhBGADvOCzF</binary>
</binaryDataArray>

Currently, we would just return "non-standard data array" when evaluating the second example. This PR makes it properly return the non-standard name.

I also updated the array name set as several new ion-mobility related array types have been added. These names are only used as a last resort when trying to determine which term really represents the array's name.