Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
Trying to parse a Waters Apex3D-generated mzML file crashes with a misleading error:
/site-packages/pyteomics/mzml.pyc in _handle_binary(self, info, **kwargs)
237 dtype = self._determine_array_dtype(info)
238 compressed = self._determine_compression(info)
--> 239 name = self._detect_array_name(info)
240 binary = info.pop('binary')
241 if not self.decode_binary:
/site-packages/pyteomics/mzml.pyc in _detect_array_name(self, info)
137 candidates = []
138 for k in info:
--> 139 if k.endswith(' array') and not info[k]:
140 if NON_STANDARD_DATA_ARRAY == k:
141 is_non_standard = True
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This happens because one of the data arrays is not declared non-standard and does not have a userParam declaring the name ending with "array", so our recovery strategy doesn't work. This results in the name being returned as "binary".
Because there is a key "binary" in the info dict, the next time this object goes through _get_info_smart, the whole spectrum goes back through _handle_binaryagain, and that is when the error hits.
There are a lot of comments on _detect_array_name, one of which explicitly specifies that returning "binary" will signal special handling elsewhere:
https://github.com/levitsky/pyteomics/blob/master/pyteomics/mzml.py#L157-L165. I think I added something around this to support a different Waters-generated mzML three or four years ago, but lacking the malformed mzML file from then, I don't know what I was trying to recover from. I'm going to work a bit harder on taking any valid parameter name.
Trying to parse a Waters Apex3D-generated mzML file crashes with a misleading error:
This happens because one of the data arrays is not declared non-standard and does not have a userParam declaring the name ending with "array", so our recovery strategy doesn't work. This results in the name being returned as
"binary"
.Because there is a key
"binary"
in the infodict
, the next time this object goes through_get_info_smart
, the whole spectrum goes back through_handle_binary
again, and that is when the error hits.There are a lot of comments on
_detect_array_name
, one of which explicitly specifies that returning"binary"
will signal special handling elsewhere: https://github.com/levitsky/pyteomics/blob/master/pyteomics/mzml.py#L157-L165. I think I added something around this to support a different Waters-generated mzML three or four years ago, but lacking the malformed mzML file from then, I don't know what I was trying to recover from. I'm going to work a bit harder on taking any valid parameter name.