levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

ProForma parser can't handle a modification name with colon if it occurs without ontology prefix #98

Closed levitsky closed 1 year ago

levitsky commented 1 year ago

Some modifications, including cation adducts, have a colon in their name, e.g. Cation:Na.

pyteomics.proforma can't handle those when the ontology prefix is not specified:

In [18]: proforma.ProForma.parse('VKATAGDTHLGGED[Cation:Na]FDNRLVNHLATE')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 proforma.ProForma.parse('VKATAGDTHLGGED[Cation:Na]FDNRLVNHLATE')

File ~/py/pyteomics/pyteomics/proforma.py:2037, in ProForma.parse(cls, string)
   2024 @classmethod
   2025 def parse(cls, string):
   2026     '''Parse a ProForma string.
   2027 
   2028     Parameters
   (...)
   2035     ProForma
   2036     '''
-> 2037     return cls(*parse(string))

File ~/py/pyteomics/pyteomics/proforma.py:1592, in parse(sequence)
   1590 if c in VALID_AA:
   1591     if current_aa is not None:
-> 1592         positions.append((current_aa, current_tag() if current_tag else None))
   1593     current_aa = c
   1594 elif c == '[':

File ~/py/pyteomics/pyteomics/proforma.py:1408, in TokenBuffer.__call__(self)
   1407 def __call__(self):
-> 1408     return self.process()

File ~/py/pyteomics/pyteomics/proforma.py:1457, in TagParser.process(self)
   1456 def process(self):
-> 1457     value = super(TagParser, self).process()
   1458     if not isinstance(value, list):
   1459         value = [value]

File ~/py/pyteomics/pyteomics/proforma.py:1398, in TokenBuffer.process(self)
   1396     value = [self._transform(v) for v in self.tokenize()]
   1397 else:
-> 1398     value = self._transform(self.buffer)
   1399 self.reset()
   1400 return value

File ~/py/pyteomics/pyteomics/proforma.py:1451, in TagParser._transform(self, value)
   1450 def _transform(self, value):
-> 1451     tag = process_tag_tokens(value)
   1452     if tag.group_id:
   1453         self.group_ids.add(tag.group_id)

File ~/py/pyteomics/pyteomics/proforma.py:1106, in process_tag_tokens(tokens)
   1104         main_tag = GenericModification(''.join(value))
   1105     else:
-> 1106         tag_type = TagBase.find_by_tag(prefix)
   1107         main_tag = tag_type(value)
   1108 if len(parts) > 1:

File ~/py/pyteomics/pyteomics/proforma.py:82, in PrefixSavingMeta.find_by_tag(self, tag_name)
     80     raise ValueError("tag_name cannot be None!")
     81 tag_name = tag_name.lower()
---> 82 return self.prefix_map[tag_name]

This works fine:

proforma.ProForma.parse('VKATAGDTHLGGED[UNIMOD:Cation:Na]FDNRLVNHLATE')

The spec says that the Unimod prefix is optional, which does work with other names.

mobiusklein commented 1 year ago

Good catch. This should be straight-forward to solve, so long as the modification piece prior to the : isn't a match to another tag type. I'll try fixing this tonight.