levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

[proforma] Final and C-terminal modifications are parsed as one #77

Closed RalfG closed 1 year ago

RalfG commented 1 year ago

Hi @mobiusklein,

I noticed an issue in parsing proforma sequences with modifications on both the final amino acid and the C-terminus:

>>> from pyteomics import proforma
>>> proforma.parse("[iTRAQ4plex]-EM[U:Oxidation]EVNES[Phospho]PEK[iTRAQ4plex]-[Methyl]")
([('E', None),
  ('M', [UnimodModification('Oxidation', None, None)]),
  ('E', None),
  ('V', None),
  ('N', None),
  ('E', None),
  ('S', [GenericModification('Phospho', None, None)]),
  ('P', None),
  ('E', None),
  ('K', None)],
 {'n_term': [GenericModification('iTRAQ4plex', None, None)],
  'c_term': [GenericModification('iTRAQ4plexMethyl', None, None)],
  'unlocalized_modifications': [],
  'labile_modifications': [],
  'fixed_modifications': [],
  'intervals': [],
  'isotopes': [],
  'group_ids': [],
  'charge_state': None})

Both modifications (iTRAQ4plex and Methyl) get parsed as a single C-terminal modification iTRAQ4plexMethyl.

I tried looking into the parser function, but it seems pretty complex. Maybe you can find the issue more quickly? Thanks in advance!