lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

Move toward proforma compliance in open mod searches #41

Closed jspaezp closed 1 year ago

jspaezp commented 1 year ago

Hello there!

I was wondering if you have considered moving to a more standard way of reporting the peptide sequences from the search engine. I noticed that there has already been a shift (from parenthesis to brackets) when peptideshaker-support was added. I think this would be a great addition to allow the usage of the data in downstream applications!

In particular I am facing an issue where open searches get reported in the last aminoacid, which makes it ambiguous to know whether the mod is indeed in the terminal position or unknown in location.

# Current way of reporting an open mod with a variable mod
AWEIRDPEPTIDEM[+15.9949][+xx.xxx]

# Suggested Proforma compliant way of reporting it, making explicit the location
# is unknown
[+xx.xxxx]?AWEIRDPEPTIDEM[+15.9949]

LMK what you think! Thanks again for the amazing search engine -Sebastian

Leaving here the spec document for later. https://github.com/HUPO-PSI/ProForma/blob/master/SpecDocument/ProForma_v2_draft15_February2022.pdf

lazear commented 1 year ago

Hi Sebastian,

I definitely think this is a good idea! I also want to revisit localization of open mods at some point as well.

Are you using a fork that's reporting open mods at the terminus? Sage currently doesn't append delta masses from open searches to the peptide sequence.

Mike

jspaezp commented 1 year ago

oh shoot, mb... that is generated from a downstream tool! I'm sorry! Let me look into it a little more and I'll get back to this issue!