smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 46 forks source link

Advanced protease cleavage #1215

Closed rmillikin closed 6 years ago

rmillikin commented 6 years ago

We have a user who wants to run a search where the sample was digested with trypsin and AspN at the same time. It's possible to add a custom protease to get close to the intended behavior (cleave at K,R,D) but either the C or N terminal is required (can't do both separately).

To implement this advanced behavior, we need to pair K,R with C term, and D with N term. Syntax could be:

Sequences inducing cleavage: K,R|D Cleavage terminus: C|N

A similar problem exists for sequences preventing cleavage, e.g. LysC and trypsin and could be solved with the same syntax. Probably same with cleavage specificity.

Alternatively we could allow combining proteases but this implementation seems more cumbersome.

Dmorgen commented 6 years ago

If I can interfere, it would be useful to have a modified version of the annotation, since we have sometimes proteases that are influenced by both the sequence before and after the cleavage site. more over, it would be useful to allow multiple site protease: for example, one protease cleaves after double basic residues - KK/KR/RR. I would suggest allowing multiple enzymes using different syntax per enzyme, allowing for multiple positions before and after the cleavage sites. also, allow for a NOT option - for example: A/S/T,X[P],X,R|[P] - A or S or T 4 positions, before the cleavage site, then any AA that is not P, than any AA, than R and it must not be followed by P after the cleavage site. In this case, you have to have separate rules for each enzyme.

Thanks! D.

rmillikin commented 6 years ago

Some of this is done in #1240 . Namely, you can associate a specific residue inducing cleavage with a residue preventing cleavage and a terminus type.

Still to do: Double residues inducing cleavage (e.g. RR or KK) Motifs (e.g. RXXXK)