levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

C-terminal modification parsing in ProForma #107

Closed caetera closed 1 year ago

caetera commented 1 year ago

When both a fixed modification of the last amino acid and a C-terminal modification are present they are fused together during the parsing. Please, see example below.

Only modification on the last amino acid parses as expected

> ProForma.parse('PEPK[TMT6plex]')
ProForma([('P', None), ('E', None), ('P', None), ('K', [GenericModification('TMT6plex', None, None)])], {'n_term': None, 'c_term': None, 'unlocalized_modifications': [], 'labile_modifications': [], 'fixed_modifications': [], 'intervals': [], 'isotopes': [], 'group_ids': [], 'charge_state': None})

Only C-terminal modification also parses as expected

> ProForma.parse('PEPK-[Amidation]')
ProForma([('P', None), ('E', None), ('P', None), ('K', None)], {'n_term': None, 'c_term': [GenericModification('Amidation', None, None)], 'unlocalized_modifications': [], 'labile_modifications': [], 'fixed_modifications': [], 'intervals': [], 'isotopes': [], 'group_ids': [], 'charge_state': None})

When combined, modification names are fused and interpreted as C-terminal one

> ProForma.parse('PEPK[TMT6plex]-[Amidation]')
ProForma([('P', None), ('E', None), ('P', None), ('K', None)], {'n_term': None, 'c_term': [GenericModification('TMT6plexAmidation', None, None)], 'unlocalized_modifications': [], 'labile_modifications': [], 'fixed_modifications': [], 'intervals': [], 'isotopes': [], 'group_ids': [], 'charge_state': None})
mobiusklein commented 1 year ago

Are you using the latest release? I'm not able to reproduce your third case.

In [1]: from pyteomics import proforma
In [2]: pep = proforma.ProForma.parse("PEPK[TMT6plex]-[Amidation]")
In [3]: pep
Out[3]: ProForma([('P', None), ('E', None), ('P', None), ('K', [GenericModification('TMT6plex', None, None)])], {'n_term': None, 'c_term': [GenericModification('Amidation', None, None)], 'unlocalized_modifications': [], 'labile_modifications': [], 'fixed_modifications': [], 'intervals': [], 'isotopes': [], 'group_ids': [], 'charge_state': None})
caetera commented 1 year ago

Hi @mobiusklein,

I was using (not too outdated) 4.5.3, but an update to the latest 4.5.6 does solve the problem.