topdownproteomics / sdk

Software solution for common top-down proteomics tasks
http://www.topdownproteomics.org/
MIT License
9 stars 4 forks source link

Testing equivalency of proteoform sequences #20

Open acesnik opened 6 years ago

acesnik commented 6 years ago

Testing the chemical formula and sequence of the residues should be sufficient to address this question and enable comparisons using different PTM databases.

acesnik commented 6 years ago

Matching with ambiguity

There may be several types of ambiguity:

  1. Modified residues may be isobaric with other amino acids
  2. Structural variations (e.g. glycans) may have the same mass
  3. Mass tags may be attributable to several modifications
  4. Comparing proteoform IDs with ambiguous localizations with each other or with completely characterized proteoforms

Our equivalence test might need to output a match score (similar to BLAST) or a match classification (exact, ambiguous, non-match)