lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
210 stars 39 forks source link

Overhaul modifications: multiple variable mods, modifications of specific AA at termini #69

Closed lazear closed 1 year ago

lazear commented 1 year ago

This PR adds support for multiple variable modifications, and peptide/protein-terminal modifications that are specific to individual amino acids. This also overhauls both the internal data representation of Peptide, and how modifications are applied. This speeds up fragment index creation significantly (and should reduce memory usage), but it is still somewhat slow for large numbers of variable modifications - something to improve in the future!

The new syntax looks like this:

"variable_mods": {
    "M": [15.9949],
    "^Q": -17.026549,
    "^E": -18.010565,
    "[": 42.010565
},

Either a single floating point number (-18.0) or a list of floating point numbers ([-18.0, -15.2]) can be supplied as modifications.

Modification strings without the "X" will be treated as before - applied to the N/C-terminus of the peptide itself.