Closed michabirklbauer closed 7 months ago
I did that sort of thing a lot when converting from Profoma to Peprec format (DeepLC input). Since your example input is a valid ProForma, that should work. There is, probably, a bit of overhead, though, when creating ProForma object from the string representation, timeit says 36.5 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
for the complete function
from pyteomics.proforma import ProForma
# This version returns only positions
def get_peprec_modification_positions(proforma_string):
result = []
p = ProForma.parse(proforma_string)
if not p.n_term is None:
for mod in p.n_term:
result.append(0)
for i, (aa, mods) in enumerate(p.sequence):
if not mods is None:
for mod in mods:
result.append(i + 1)
if not p.c_term is None:
for mod in p.c_term:
result.append(-1)
return result
# This version returns Peprec modification string, i.e. position1|identity1|position2|identity2|...
def get_peprec_modifications(proforma_string):
result = []
p = ProForma.parse(proforma_string)
if not p.n_term is None:
for mod in p.n_term:
result.append(f'0|{mod}')
for i, (aa, mods) in enumerate(p.sequence):
if not mods is None:
for mod in mods:
result.append(f'{i + 1}|{mod}')
if not p.c_term is None:
for mod in p.c_term:
result.append(f'-1|{mod}')
return '|'.join(result)
@caetera Thanks Vladimir! I just tested it and it works perfect! And I don't think the overhead is going to matter, as we only compute this once or twice. Should I commit this or would you want to commit it yourself? 😊
Hi @michabirklbauer, happy to help. You are welcome to commit it yourself - you likely know better where it should be in the code.
Alright, will do! Thank you!
get_modification_positions("ARTKQTARKSTGGKAPRKQLATKAARKSAPAT[-79.966331]GGV[+79.966331]KKPHRYRPGTVALRE")
should return(32, 35)
-> [1-based index of the modification postitions]