levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

unable to extract "charge array" with pyteomics.ms2.IndexedMS2 #108

Closed irleader closed 1 year ago

irleader commented 1 year ago

Hi,

I first convert .RAW file to .ms2 file using RawConverter, and use pyteomics.ms2.IndexedMS2 to read the .ms2 file:

from pyteomics import ms2 ms2_file=ms2.IndexedMS2('Tf_AF568_1.ms2')

While there is no charge array, when accessing it, there is error message: "Exception in comms call get_value:

File "miniconda3/lib/python3.9/site-packages/spyder_kernels/comms/commbase.py", line 347, in _handle_remote_call self._set_call_return_value(msg_dict, return_value)

File "miniconda3/lib/python3.9/site-packages/spyder_kernels/comms/commbase.py", line 384, in _set_call_return_value self._send_message('remote_call_reply', content=content, data=data,

File "miniconda3/lib/python3.9/site-packages/spyder_kernels/comms/frontendcomm.py", line 109, in _send_message return super(FrontendComm, self)._send_message(*args, **kwargs)

File "miniconda3/lib/python3.9/site-packages/spyder_kernels/comms/commbase.py", line 247, in _send_message buffers = [cloudpickle.dumps(

File "miniconda3/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps cp.dump(obj)

File "miniconda3/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump return Pickler.dump(self, obj)

File "miniconda3/lib/python3.9/site-packages/pyteomics/ms1.py", line 286, in __reduce_ex__ self._read_charges, self._dtype_dict, self.encoding, self.block_size, True),

File "miniconda3/lib/python3.9/site-packages/pyteomics/auxiliary/file_helpers.py", line 222, in getattr return getattr(self._source, attr)

File "miniconda3/lib/python3.9/site-packages/pyteomics/auxiliary/file_helpers.py", line 129, in getattr return getattr(self.file, attr)

AttributeError: '_io.BufferedReader' object has no attribute '_readcharges'"

I have attached a sample .ms2 file with extension changed to .txt (as github does not allow .ms2 format to be uploaded).

Thanks in advance!

Tf_AF568_1.ms2.txt

levitsky commented 1 year ago

I can reproduce the error, but not upon creation of the IndexedMS2 object. It happens if I try to pickle it:

Input In [5], in <cell line: 1>()
----> 1 pickle.dumps(f)

File ~/py/pyteomics/pyteomics/ms1.py:286, in IndexedMS1.__reduce_ex__(self, protocol)
    283 def __reduce_ex__(self, protocol):
    284     return (self.__class__,
    285         (self._source_init, False, self._convert_arrays,
--> 286             self._read_charges, self._dtype_dict, self.encoding, self.block_size, True),
    287         self.__getstate__())

File ~/py/pyteomics/pyteomics/auxiliary/file_helpers.py:222, in FileReader.__getattr__(self, attr)
    220 if attr == '_source':
    221     raise AttributeError
--> 222 return getattr(self._source, attr)

File ~/py/pyteomics/pyteomics/auxiliary/file_helpers.py:129, in _file_obj.__getattr__(self, attr)
    128 def __getattr__(self, attr):
--> 129     return getattr(self.file, attr)

AttributeError: '_io.BufferedReader' object has no attribute '_read_charges'

This is not file-specific.

levitsky commented 1 year ago

@irleader The latest master should fix this, please let me know if it works for you.

irleader commented 1 year ago

Hi Lev,

Thanks a lot for your fast fix!

The error is no longer there, but I am still unable to get "charge array". Only "intensity array",'m/z array' and 'params'.

I use this command to install pyteomics-4.6a0: pip install git+https://github.com/levitsky/pyteomics@master

Best regards

levitsky commented 1 year ago

Indeed, charge array was never implemented in MS1 and MS2 parsers. Lacking rich personal experience with this format, I was using this publication as documentation for the formats. I see in your file that instead of pairs of [m/z, intensity] there are quadruples of numbers, but I can only guess what they are (and for the fourth number, I don't even have a guess). Do you happen to have any sources that I could use to extend the parsers?

irleader commented 1 year ago

Hi Lev,

"charge array" is in the documentation (https://pyteomics.readthedocs.io/en/latest/api/ms2.html), so I thought it was implemented.

For the ms2 file output from RawConverter, the first column is m/z, second column is intensity, third column is peak(fragment ion) charge, which is what I want. The fourth column is resolution of fragment ion m/z. The MS I am using is ThermoFisher QE, which has a max scan rate of up to 12Hz at resolution setting of 17500 at m/z 200. We also set our acquisition setting at resolution 17500 for our MS, while the real resolution of the fragment ion m/z is slightly deviated but close to 17500.

It would be great if you could at least implement "charge array" to extract the third column, Thanks a lot!

Best regards

levitsky commented 1 year ago

Hi @irleader,

The current master version will parse fragment charges by default. You are welcome to try it.

You can control the processing of arrays with read_charges and convert_arrays parameters.

levitsky commented 1 year ago

Update: added reading the resolutions, too. Use read_resolutions to disable.