levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
Apache License 2.0
105 stars 34 forks source link

MGF Writing error - `TypeError: 'numpy.int64' object is not iterable` #122

Closed CCranney closed 9 months ago

CCranney commented 9 months ago


I create mgf files using your library in one of my programs. I noticed that writing an MGF file with the new 4.6.2 update causes an error that did not exist in the previous version.

I've written up some code that breaks in version 4.6.2 but worked in 4.6:

from pyteomics import mgf

mgfSpectra = [
        'params': {'pepmass': (1000, None), 'seq': 'peptide1000', 'charge': 1, 'title': 'DECOY_id1000', 'protein': '1/DECOY_protein1000'}, 
        'm/z array': array([ 999.5,  999.6,  999.7,  999.8,  999.9, 1000. , 1000.1, 1000.2,
            1000.3, 1000.4]), 
        'intensity array': array([ 1000.,  2000.,  3000.,  4000.,  5000.,  6000.,  7000.,  8000.,
            9000., 10000.])
        'params': {'pepmass': (1001, None), 'seq': 'peptide1001', 'charge': 1, 'title': 'DECOY_id1001', 'protein': '1/DECOY_protein1001'}, 
        'm/z array': array([1000.5, 1000.6, 1000.7, 1000.8, 1000.9, 1001. , 1001.1, 1001.2,
            1001.3, 1001.4]), 
        'intensity array': array([ 1000.,  2000.,  3000.,  4000.,  5000.,  6000.,  7000.,  8000.,
            9000., 10000.])

mgf.write(mgfSpectra, '<path-to-file>.mgf')

The error produced in version 4.6.2 is the following:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../venv/lib/python3.9/site-packages/pyteomics/auxiliary/file_helpers.py:599: in helper
    return _func(*call_args, **call_kwargs)
../venv/lib/python3.9/site-packages/pyteomics/mgf.py:784: in write
    output.write(key_value_line(key, val))
../venv/lib/python3.9/site-packages/pyteomics/mgf.py:731: in key_value_line
    return param_formatters.get(key, _default_repr)(key, val) + '\n'
../venv/lib/python3.9/site-packages/pyteomics/mgf.py:612: in _charge_repr
    return '{}={}'.format(k.upper(), aux.Charge(charge) if isinstance(charge, int) else aux.ChargeList(charge))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = [], args = (1,), kwargs = {}

    def __init__(self, *args, **kwargs):
        if args and isinstance(args[0], basestring):
            delim = r'(?:,\s*)|(?:\s*and\s*)'
            self.extend(map(Charge, re.split(delim, args[0])))
                super(ChargeList, self).__init__(
                    sorted(set(args[0])), *args[1:], **kwargs)
            except Exception:
>               super(ChargeList, self).__init__(*args, **kwargs)
E               TypeError: 'numpy.int64' object is not iterable

../venv/lib/python3.9/site-packages/pyteomics/auxiliary/structures.py:115: TypeError
levitsky commented 9 months ago

Thank you for reporting!

Curiously, your code works fine for me on 4.6.2 with Python 3.9 (as well as 3.11). The only change I made was to change array to np.array.

levitsky commented 9 months ago

Ah, I changed the charge values to np.int64(1) and then I got the error. I will look into it shortly. As a workaround, converting the charge to a regular int should fix the problem.

CCranney commented 9 months ago

Ah! Gotcha. You commented at just the right time, I was about to post my rewrite of the test script and my puzzlement over why it was different from what I posted above. That conversion fixed my problem, thank you! I'll leave the issue open as you investigate, but feel free to close when you've finished your analysis.