levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

Load mzML file with ion mobility #116

Closed mobiotwin closed 11 months ago

mobiotwin commented 11 months ago

Hi,

We are planing to switch from pyopenms.

However, I get an issue loading mzML file with ion mobility data.

My code:


from pyteomics import mzml

with mzml.read(
    "../../mzML/20230714_6Mix_0_NA_MS2_RP_HDMSe_POS_N02.mzML",) as reader:
    for spectrum in reader:
        print(spectrum)

it iterates for the first two indices and then crashes

the output of the first 2 indices:

{'index': 0, 'id': 'merged=1 function=3 block=1', 'defaultArrayLength': 1697, 'dataProcessingRef': 'pwiz_Reader_Waters_conversion', 'scanList': {'count': 1, 'scan': [{'scanWindowList': {'count': 1, 'scanWindow': [{'scan window lower limit': 50.0, 'scan window upper limit': 1200.0}]}, 'preset scan configuration': 3.0, 'scan start time': 0.021783333272, 'ion mobility drift time': 5.425052642822}], 'no combination': ''}, 'MS1 spectrum': '', 'ms level': 1, 'positive scan': '', 'profile spectrum': '', 'base peak m/z': 556.208923, 'base peak intensity': 498.0, 'total ion current': 16544.0, 'count': 3, 'm/z array': array([  84.93608,   84.94011,   84.94413, ..., 1068.4116 , 1068.4259 ,
       1068.4402 ], dtype=float32), 'intensity array': array([-1.79016107e-25,  1.46673247e+24, -1.51178715e-37, -1.54913541e-17,
       -2.04779361e+38,  9.71913394e-08,  1.46286041e-07,  8.39809662e-16,
        2.97575312e-14, -7.86642218e-09,  2.28726658e+15,  5.43110625e+05,
       -1.10746738e-28,  5.17432720e+07,  1.77395478e+30, -7.67273821e-23,
       -2.65578475e+26,  7.64251670e-35, -1.13935533e+21, -3.42280520e+15,
       -1.31124127e+00,  2.51333062e-02, -4.92048502e-01,  2.87613808e+17,
        ....
       -3.52494206e-26, -9.56013077e+16, -3.66199280e+14, -1.30716445e-23,
        2.39365726e+19, -7.26167558e+16, -2.37517262e+01,  2.17031360e-01,
        5.48355529e+29,  2.46857384e-20, -1.47907729e+29, -6.52608461e+17,
       -4.61074589e-23, -1.35001278e+32,  4.02059961e+33,  2.57748238e+26,
        6.37279007e-10, -1.07273133e+09,  2.57022423e+34, -3.83416989e+31,
        5.46218035e+29, -2.68281087e+28,  3.83631401e-19,  1.12839910e-35,
       -1.05270765e+17, -8.27873639e+33, -2.49165326e+20,  1.12048226e+23,
        1.46826276e-35,  2.00541957e+37,  7.42941018e-29, -9.97033520e+07,
        4.83702717e-23, -1.54230037e-07, -8.56633144e+23,  2.74046398e+26,
       -4.12114197e+02,  5.58272716e+37,  5.46802350e+06,  4.70167584e-38,
       -4.00289041e-13], dtype=float32), 'raw ion mobility array': array([0.65100634, 0.65100634, 0.65100634, ..., 9.710844  , 9.710844  ,
       9.710844  ], dtype=float32)}
{'index': 1, 'id': 'merged=2 function=1 block=1', 'defaultArrayLength': 28185, 'dataProcessingRef': 'pwiz_Reader_Waters_conversion', 'scanList': {'count': 1, 'scan': [{'scanWindowList': {'count': 1, 'scanWindow': [{'scan window lower limit': 50.0, 'scan window upper limit': 1200.0}]}, 'preset scan configuration': 1.0, 'scan start time': 0.143583327532, 'ion mobility drift time': 5.425052642822}], 'no combination': ''}, 'MS1 spectrum': '', 'ms level': 1, 'positive scan': '', 'profile spectrum': '', 'base peak m/z': 149.014969, 'base peak intensity': 10352.0, 'total ion current': 449373.0, 'count': 3, 'm/z array': array([ 84.96828,  84.9723 ,  84.97632, ..., 367.58862, 367.59708,
       367.6054 ], dtype=float32), 'intensity array': array([-3.14475813e-21, -3.45934282e+14, -2.35413522e-21, ...,
       -2.37112786e-32,  1.50401835e-36,  1.51039343e-33], dtype=float32), 'raw ion mobility array': array([ 0.48825476,  0.48825476,  0.48825476, ..., 10.578853  ,
       10.578853  , 10.578853  ], dtype=float32)}

Then I got the following error

{
    "name": "ValueError",
    "message": "buffer size must be a multiple of element size",
    "stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)\nCell \u001b[0;32mIn[81], line 28\u001b[0m\n\u001b[1;32m     24\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mpyteomics\u001b[39;00m \u001b[39mimport\u001b[39;00m mzml\n\u001b[1;32m     26\u001b[0m \u001b[39mwith\u001b[39;00m mzml\u001b[39m.\u001b[39mread(\n\u001b[1;32m     27\u001b[0m     \u001b[39m\"\u001b[39m\u001b[39m../../mzML/20230714_6Mix_0_NA_MS2_RP_HDMSe_POS_N02.mzML\u001b[39m\u001b[39m\"\u001b[39m,) \u001b[39mas\u001b[39;00m reader:\n\u001b[0;32m---> 28\u001b[0m     \u001b[39mfor\u001b[39;00m spectrum \u001b[39min\u001b[39;00m reader:\n\u001b[1;32m     29\u001b[0m         \u001b[39mprint\u001b[39m(spectrum)\n\u001b[1;32m     32\u001b[0m     \u001b[39m#     # if spectrum[\"ms level\"] != 1:\u001b[39;00m\n\u001b[1;32m     33\u001b[0m     \u001b[39m#     #     continue\u001b[39;00m\n\u001b[1;32m     34\u001b[0m     \u001b[39m#     print(spectrum[\"m/z array\"].shape)\u001b[39;00m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     39\u001b[0m         \u001b[39m# pyteomics_result.append(_map_spectrum_to_numpy(spectrum))\u001b[39;00m\n\u001b[1;32m     40\u001b[0m     \u001b[39m# pyteomics_result = np.concatenate(pyteomics_result)\u001b[39;00m\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/auxiliary/file_helpers.py:178\u001b[0m, in \u001b[0;36mIteratorContextManager.__next__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    176\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__next__\u001b[39m(\u001b[39mself\u001b[39m):\n\u001b[1;32m    177\u001b[0m     \u001b[39m# try:\u001b[39;00m\n\u001b[0;32m--> 178\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mnext\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_reader)\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/xml.py:1239\u001b[0m, in \u001b[0;36mIterfind.__next__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m   1237\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_iterator \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[1;32m   1238\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_iterator \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_make_iterator()\n\u001b[0;32m-> 1239\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mnext\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_iterator)\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/xml.py:598\u001b[0m, in \u001b[0;36mXML._iterfind_impl\u001b[0;34m(self, path, **kwargs)\u001b[0m\n\u001b[1;32m    596\u001b[0m                 \u001b[39myield\u001b[39;00m info\n\u001b[1;32m    597\u001b[0m         \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 598\u001b[0m             info \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_get_info_smart(child, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m    599\u001b[0m             \u001b[39myield\u001b[39;00m info\n\u001b[1;32m    600\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m localname \u001b[39m==\u001b[39m \u001b[39m'\u001b[39m\u001b[39m*\u001b[39m\u001b[39m'\u001b[39m:\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/mzml.py:327\u001b[0m, in \u001b[0;36mMzML._get_info_smart\u001b[0;34m(self, element, **kw)\u001b[0m\n\u001b[1;32m    323\u001b[0m     info \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_get_info(element,\n\u001b[1;32m    324\u001b[0m             recursive\u001b[39m=\u001b[39m(rec \u001b[39mif\u001b[39;00m rec \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39melse\u001b[39;00m \u001b[39mFalse\u001b[39;00m),\n\u001b[1;32m    325\u001b[0m             \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m    326\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 327\u001b[0m     info \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_get_info(element,\n\u001b[1;32m    328\u001b[0m             recursive\u001b[39m=\u001b[39;49m(rec \u001b[39mif\u001b[39;49;00m rec \u001b[39mis\u001b[39;49;00m \u001b[39mnot\u001b[39;49;00m \u001b[39mNone\u001b[39;49;00m \u001b[39melse\u001b[39;49;00m \u001b[39mTrue\u001b[39;49;00m),\n\u001b[1;32m    329\u001b[0m             \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m    330\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39m'\u001b[39m\u001b[39mbinary\u001b[39m\u001b[39m'\u001b[39m \u001b[39min\u001b[39;00m info \u001b[39mand\u001b[39;00m \u001b[39misinstance\u001b[39m(info, \u001b[39mdict\u001b[39m):\n\u001b[1;32m    331\u001b[0m     info \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_handle_binary(info, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/xml.py:433\u001b[0m, in \u001b[0;36mXML._get_info\u001b[0;34m(self, element, **kwargs)\u001b[0m\n\u001b[1;32m    431\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    432\u001b[0m     \u001b[39mif\u001b[39;00m cname \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m schema_info[\u001b[39m'\u001b[39m\u001b[39mlists\u001b[39m\u001b[39m'\u001b[39m]:\n\u001b[0;32m--> 433\u001b[0m         info[cname] \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_get_info_smart(child, ename\u001b[39m=\u001b[39;49mcname, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m    434\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    435\u001b[0m         info\u001b[39m.\u001b[39msetdefault(cname, [])\u001b[39m.\u001b[39mappend(\n\u001b[1;32m    436\u001b[0m             \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_get_info_smart(child, ename\u001b[39m=\u001b[39mcname, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs))\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/mzml.py:327\u001b[0m, in \u001b[0;36mMzML._get_info_smart\u001b[0;34m(self, element, **kw)\u001b[0m\n\u001b[1;32m    323\u001b[0m     info \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_get_info(element,\n\u001b[1;32m    324\u001b[0m             recursive\u001b[39m=\u001b[39m(rec \u001b[39mif\u001b[39;00m rec \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39melse\u001b[39;00m \u001b[39mFalse\u001b[39;00m),\n\u001b[1;32m    325\u001b[0m             \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m    326\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 327\u001b[0m     info \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_get_info(element,\n\u001b[1;32m    328\u001b[0m             recursive\u001b[39m=\u001b[39;49m(rec \u001b[39mif\u001b[39;49;00m rec \u001b[39mis\u001b[39;49;00m \u001b[39mnot\u001b[39;49;00m \u001b[39mNone\u001b[39;49;00m \u001b[39melse\u001b[39;49;00m \u001b[39mTrue\u001b[39;49;00m),\n\u001b[1;32m    329\u001b[0m             \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m    330\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39m'\u001b[39m\u001b[39mbinary\u001b[39m\u001b[39m'\u001b[39m \u001b[39min\u001b[39;00m info \u001b[39mand\u001b[39;00m \u001b[39misinstance\u001b[39m(info, \u001b[39mdict\u001b[39m):\n\u001b[1;32m    331\u001b[0m     info \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_handle_binary(info, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/xml.py:436\u001b[0m, in \u001b[0;36mXML._get_info\u001b[0;34m(self, element, **kwargs)\u001b[0m\n\u001b[1;32m    433\u001b[0m                 info[cname] \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_get_info_smart(child, ename\u001b[39m=\u001b[39mcname, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m    434\u001b[0m             \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    435\u001b[0m                 info\u001b[39m.\u001b[39msetdefault(cname, [])\u001b[39m.\u001b[39mappend(\n\u001b[0;32m--> 436\u001b[0m                     \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_get_info_smart(child, ename\u001b[39m=\u001b[39;49mcname, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs))\n\u001b[1;32m    437\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    438\u001b[0m     \u001b[39m# handle the case where we do not want to unpack all children, but\u001b[39;00m\n\u001b[1;32m    439\u001b[0m     \u001b[39m# *Param tags are considered part of the current entity, semantically\u001b[39;00m\n\u001b[1;32m    440\u001b[0m     \u001b[39mfor\u001b[39;00m child \u001b[39min\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_find_immediate_params(element, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs):\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/mzml.py:331\u001b[0m, in \u001b[0;36mMzML._get_info_smart\u001b[0;34m(self, element, **kw)\u001b[0m\n\u001b[1;32m    327\u001b[0m     info \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_get_info(element,\n\u001b[1;32m    328\u001b[0m             recursive\u001b[39m=\u001b[39m(rec \u001b[39mif\u001b[39;00m rec \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39melse\u001b[39;00m \u001b[39mTrue\u001b[39;00m),\n\u001b[1;32m    329\u001b[0m             \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m    330\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39m'\u001b[39m\u001b[39mbinary\u001b[39m\u001b[39m'\u001b[39m \u001b[39min\u001b[39;00m info \u001b[39mand\u001b[39;00m \u001b[39misinstance\u001b[39m(info, \u001b[39mdict\u001b[39m):\n\u001b[0;32m--> 331\u001b[0m     info \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_handle_binary(info, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m    333\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39m'\u001b[39m\u001b[39mbinaryDataArray\u001b[39m\u001b[39m'\u001b[39m \u001b[39min\u001b[39;00m info \u001b[39mand\u001b[39;00m \u001b[39misinstance\u001b[39m(info, \u001b[39mdict\u001b[39m):\n\u001b[1;32m    334\u001b[0m     \u001b[39mfor\u001b[39;00m array \u001b[39min\u001b[39;00m info\u001b[39m.\u001b[39mpop(\u001b[39m'\u001b[39m\u001b[39mbinaryDataArray\u001b[39m\u001b[39m'\u001b[39m):\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/mzml.py:308\u001b[0m, in \u001b[0;36mMzML._handle_binary\u001b[0;34m(self, info, **kwargs)\u001b[0m\n\u001b[1;32m    305\u001b[0m     \u001b[39mreturn\u001b[39;00m info\n\u001b[1;32m    307\u001b[0m \u001b[39mif\u001b[39;00m binary:\n\u001b[0;32m--> 308\u001b[0m     array \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mdecode_data_array(binary, compressed, dtype)\n\u001b[1;32m    309\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    310\u001b[0m     array \u001b[39m=\u001b[39m np\u001b[39m.\u001b[39marray([], dtype\u001b[39m=\u001b[39mdtype)\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/auxiliary/utils.py:287\u001b[0m, in \u001b[0;36mBinaryDataArrayTransformer.decode_data_array\u001b[0;34m(self, source, compression_type, dtype)\u001b[0m\n\u001b[1;32m    285\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(binary, \u001b[39mbytes\u001b[39m):\n\u001b[1;32m    286\u001b[0m     binary \u001b[39m=\u001b[39m \u001b[39mbytearray\u001b[39m(binary)\n\u001b[0;32m--> 287\u001b[0m array \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_transform_buffer(binary, dtype)\n\u001b[1;32m    288\u001b[0m \u001b[39mreturn\u001b[39;00m array\n\nFile \u001b[0;32m~/dev/repos/digital-twin/.venv/lib/python3.11/site-packages/pyteomics/auxiliary/utils.py:261\u001b[0m, in \u001b[0;36mBinaryDataArrayTransformer._transform_buffer\u001b[0;34m(self, binary, dtype)\u001b[0m\n\u001b[1;32m    259\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(binary, np\u001b[39m.\u001b[39mndarray):\n\u001b[1;32m    260\u001b[0m     \u001b[39mreturn\u001b[39;00m binary\u001b[39m.\u001b[39mastype(dtype, copy\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m)\n\u001b[0;32m--> 261\u001b[0m \u001b[39mreturn\u001b[39;00m np\u001b[39m.\u001b[39;49mfrombuffer(binary, dtype\u001b[39m=\u001b[39;49mdtype)\n\n\u001b[0;31mValueError\u001b[0m: buffer size must be a multiple of element size"
}

pyteomics version

pyteomics = {extras = ["xml"], version = "^4.6"}
mobiusklein commented 11 months ago

This issue indicates that the mzML file may not be well-formed. The error message means that one of the binary arrays are encoded with one size binary array data type and then labeled as another binary array data type.

To diagnose this issue, we'd need to be able to access the mzML file. If that's not easily done, you could try initializing the reader with the following:

import numpy as np
from pyteomics import mzml

array_types = {
    "m/z array": np.float64,
    "intensity array": np.float64, # or np.float32
    "raw ion mobility array": np.float32
}

with mzml.MzML(
    "../../mzML/20230714_6Mix_0_NA_MS2_RP_HDMSe_POS_N02.mzML", dtype=array_types) as reader:
    for spectrum in reader:
        print(spectrum)

From looking at the spectra you've printed the intensity array looks like the one that is most likely labeled incorrectly, given how it swings wildly from very small to very large (negative) numbers.

How was this mzML file created?

mobiotwin commented 11 months ago

Hi @mobiusklein

Thank you for your reply

Here is the link to download the file. 20230714_6Mix_0_NA_MS2_RP_HDMSe_POS_N02.mzML

The file is created via MSConvert (docker image) with the following parameters


    lock_mass = LOCK_MASS_POSITIVE
    arguments = ""
    arguments += f"wine msconvert {file} "
    arguments += "--outdir /out_data "
    arguments += "--mzML "  # write mzML format [default]
    arguments += "--32 "  # set default binary encoding to 32-bit precision
    arguments += "--combineIonMobilitySpectra "
    arguments += f"""--filter "lockmassRefiner mz={lock_mass} tol=0.5" """
    arguments += """--filter "msLevel 1" """

FYI, I was able to read it with pyopenms

mobiusklein commented 11 months ago

Odd, I was able to read the file successfully beyond the third spectrum. I noticed that the intensity array I'm seeing is not oscillating between huge negative and positive numbers, and that it appears to double-compressed:

<cvParam cvRef="MS" accession="MS:1000521" name="32-bit float" value=""/>
<cvParam cvRef="MS" accession="MS:1002748" name="MS-Numpress short logged float compression followed by zlib compression" value=""/>
<cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value="" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>

which would also cause the problem too. Could you please try upgrading to the latest version of pynumpress and seeing if that fixes the problem for you?

mobiotwin commented 11 months ago

Hi @mobiusklein ,

yes it seems working, it turns out that I only install

pip install pyteomics[xml]

which was working with data without ion mobility.

Thanks for your help