Modification object - Githubissues

sgibb commented 7 years ago

The current Modification object is a simple class that contains mainly the unimod ID, the composition and the avg/mono mass of the modification as integer/double vectors of length 1. Additionally it contains a data.frame named specificity that stores information about the site and the position of the modification, e.g.:

- General:
  Class                  :   Modification
  Accession number/id    :              1
  PSI-MS/Interim Name    :         Acetyl
  Description            :    Acetylation
  Composition            : H(2) C(2) O(1)
  Delta Average Mass     :        42.0367
  Delta Monoisotopic Mass:      42.010565
  Approved               :           TRUE
- Specificity:
    site       position      classification hidden group
1      K       Anywhere            Multiple  FALSE     1
2 N-term     Any N-term            Multiple  FALSE     2
3      C       Anywhere  Post-translational   TRUE     3
4      S       Anywhere  Post-translational   TRUE     4
5 N-term Protein N-term  Post-translational  FALSE     5
6      T       Anywhere  Post-translational   TRUE     6
7      Y       Anywhere Chemical derivative   TRUE     7
8      H       Anywhere Chemical derivative   TRUE     8
- References: use 'references(object)'

Some specificities have additional entries in the unimod database for neutral loss (#3). These entries have their own avg/mono mass (sometimes different from the general modification mass, e.g. Phosphorylation, id=21).

We could create a new class NeutralLoss that stores these information and could be attached to a specificity (which maybe should be also a class, so that we could handle different user-defined locations easier; see #2). But before creating two new classes I like to ask whether anyone has a better idea? Maybe we overcomplicate things. Maybe a data.frame (with some duplicated entries in some columns) would fit and a complicated class hierarchy is just overkill.

Class hierarchy would be:

AbstractModification (VIRTUAL; slots id, name, avgMass, monoMass, composition)
|- NeutralLoss (inherits AbstractModification, no additional slots)
`- Modification (inherits AbstractModification, 
                 additional slots: specificity (list of Specificity)

Specificity (slots: id, site, position, classification, hidden, 
                    neutralLoss (list of NeutralLoss))

vs. a data.frame where all these slots would be columns.

There is the mzID package that has a complex class hierarchy and many classes but in fact just turns a mzIdentML file into a data.frame (nearly identical use case). I don't want to create classes just because it is possible. The user should benefit from them and should be allowed to create modifications for calculateFragments and other functions.

@lgatto do you have a better idea for the data structure?

lgatto commented 7 years ago

I think it depends a bit on the use cases you envision. I am not convinced that several classes are really necessary here, and a data.frame with additional rows, similarly to the table above, seems like it could do the trick.

There could be a helper function to create new modification as new rows in the data.frame, so that the user doesn't need to create them manually - that function could do some checks, fill out values that can be calculated automatically, ...

sgibb commented 7 years ago

Maybe you are right and we should not overcomplicate things. We could avoid a class completely.

I converted all the unimod entries into a data.frame (currently without the NeutralLoss and reference information) and beside a lot of data duplication it takes just around 1 MB of memory:

head(d)
#     id   name description        lastModified approved avgMass  monoMass
# 1    1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.1  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.2  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.3  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.4  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.5  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
#      composition   site       position     classification hidden group
# 1   H(2)C(2)O(1)      K       Anywhere           Multiple  FALSE     1
# 1.1 H(2)C(2)O(1) N-term     Any N-term           Multiple  FALSE     2
# 1.2 H(2)C(2)O(1)      C       Anywhere Post-translational   TRUE     3
# 1.3 H(2)C(2)O(1)      S       Anywhere Post-translational   TRUE     4
# 1.4 H(2)C(2)O(1) N-term Protein N-term Post-translational  FALSE     5
# 1.5 H(2)C(2)O(1)      T       Anywhere Post-translational   TRUE     6
> dim(d)
# [1] 2370   13
> print(object.size(d), units="Kb")
# 908.1 Kb

By converting some of the columns into factor and Rle, removing some useless columns (lastModification, group) and adding around 1000 rows because of incorporating NeutralLoss information that would change a bit. Nevertheless I think it would be acceptable to store the whole unimod.xml in a data.frame (and keep it with the amino acid and element information in data). In that case we could also move xml2 from Depends to Suggests. The reference information is IMHO negligible. If anybody wants to know where a modification was described/published he could look it up at http://unimod.org.

Instead of a Modification class there could be a simple function that creates a modification data.frame for calculateFragments, etc. This function could look up the unimod information in the unimod data.frame.

lgatto commented 7 years ago

I think it's good to keep things as simple as possible, at least in a first stage. If necessary, it's possible to encapsulate the data in a class of the need becomes clear.

sgibb commented 6 years ago

There are three data.frames in the /data directory now (containing all information from uniprot except the references and notes):

library("unimod")

data("elements")
head(elements)

	Name	FullName	AvgMass	MonoMass
H	H	Hydrogen	1.007940	1.007825
2H	2H	Deuterium	2.014102	2.014102
Li	Li	Lithium	6.941000	7.016003
C	C	Carbon	12.010700	12.000000
13C	13C	Carbon13	13.003355	13.003355
N	N	Nitrogen	14.006700	14.003074

data("aminoacids")
head(aminoacids)

	OneLetter	ThreeLetter	FullName	AvgMass	MonoMass	H	C	N	O	S
-	-			0.0000	0.00000	0	0	0	0	0
A	A	Ala	Alanine	71.0779	71.03711	5	3	1	1	0
R	R	Arg	Arginine	156.1857	156.10111	12	6	4	1	0
N	N	Asn	Asparagine	114.1026	114.04293	6	4	2	2	0
D	D	Asp	Aspartic acid	115.0874	115.02694	5	4	1	3	0
C	C	Cys	Cysteine	103.1429	103.00919	5	3	1	1	1

data("modifications")
head(modifications)

Id	Name	Description	Composition	AvgMass	MonoMass	Site	Position	Classification	SpecGroup	LastModified	Approved	Hidden
1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	K	Anywhere	Multiple	1	2017-11-08 16:08:56	TRUE	FALSE
1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	N-term	Any N-term	Multiple	2	2017-11-08 16:08:56	TRUE	FALSE
1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	C	Anywhere	Post-translational	3	2017-11-08 16:08:56	TRUE	TRUE
1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	S	Anywhere	Post-translational	4	2017-11-08 16:08:56	TRUE	TRUE
1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	N-term	Protein N-term	Post-translational	5	2017-11-08 16:08:56	TRUE	FALSE
1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	T	Anywhere	Post-translational	6	2017-11-08 16:08:56	TRUE	TRUE

We could turn each of these data.frames into a DataFrame (with Rle or similar) or into a tibble but I don't think it is necessary because the whole database is small:

print(object.size(modifications), units="KB")
# 725.4 Kb

Currently unimod is a very small package providing just these 3 data.frames and has no dependencies (the hidden functions to create the data.frames need the xml2 package that's why it is in Suggests:).

The aminoacids and elements data.frame could replace MSnbase's amino.acids data.frame and atomic.mass vector (in R/environments.R).

Do you like these data.frames and their format or should we provide something different?

lgatto commented 6 years ago

For anything that is like a data.frame, tidy tools are superior when it comes to data wrangling. Still, I don't think we need to depend on tibble, as the conversion can be done by the user, if required. Unless, of course, we envision some sort or direct analysis ourselves where tibbles would be a better fit.

Yes, I would suggest to use the data in MSnbase and make use of unimod. The latter would probably have to be submitted to Bioconductor first, though.

sgibb commented 6 years ago

While the elements and aminoacids data.frames are very useful now. The modification data.frame is more or less useless. E.g. in topdownr we support 3 modifications (Carbamidomethyl, Acetyl, Met-loss; unimod id 4, 1, 765).

library("unimod")
data("modifications")
subset(modifications, Id %in% c(1, 4, 765) & Classification != "Artefact")

	Id	Name	Description	Composition	AvgMass	MonoMass	Site	Position	Classification	SpecGroup	LastModified	Approved	Hidden
1	1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	K	Anywhere	Multiple	1	2017-11-08 16:08:56	TRUE	FALSE
2	1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	N-term	Any N-term	Multiple	2	2017-11-08 16:08:56	TRUE	FALSE
3	1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	C	Anywhere	Post-translational	3	2017-11-08 16:08:56	TRUE	TRUE
4	1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	S	Anywhere	Post-translational	4	2017-11-08 16:08:56	TRUE	TRUE
5	1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	N-term	Protein N-term	Post-translational	5	2017-11-08 16:08:56	TRUE	FALSE
6	1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	T	Anywhere	Post-translational	6	2017-11-08 16:08:56	TRUE	TRUE
7	1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	Y	Anywhere	Chemical derivative	7	2017-11-08 16:08:56	TRUE	TRUE
8	1	Acetyl	Acetylation	H(2) C(2) O	42.0367	42.01056	H	Anywhere	Chemical derivative	8	2017-11-08 16:08:56	TRUE	TRUE
14	4	Carbamidomethyl	Iodoacetamide derivative	H(3) C(2) N O	57.0513	57.02146	C	Anywhere	Chemical derivative	1	2017-10-09 10:27:10	TRUE	FALSE
23	4	Carbamidomethyl	Iodoacetamide derivative	H(3) C(2) N O	57.0513	57.02146	U	Anywhere	Chemical derivative	10	2017-10-09 10:27:10	TRUE	TRUE
24	4	Carbamidomethyl	Iodoacetamide derivative	H(3) C(2) N O	57.0513	57.02146	M	Anywhere	Chemical derivative	11	2017-10-09 10:27:10	TRUE	TRUE
25	4	Carbamidomethyl	Iodoacetamide derivative	H(7) C(3) N O S	105.1588	105.02483	M	Anywhere	Chemical derivative	11	2017-10-09 10:27:10	TRUE	TRUE
1170	765	Met-loss	Removal of initiator methionine from protein N-terminus	H(-9) C(-5) N(-1) O(-1) S(-1)	-131.1961	-131.04048	M	Protein N-term	Co-translational	1	2007-07-15 20:01:35	FALSE	TRUE

While we could use the modifications data.frame to find the modification and its mass difference we also need the "modification rule", e.g. for Acetylation the "Any N-term" rule (so we add the mass if our fragment starts with the N-terminal end of the sequence), for Carbamidomethyl the "C|U" rule (we add the mass if "C" or "U" is present in the sequence) and for Met-loss the "remove M at the beginning" rule (we remove "M" from the start of the sequence and substract 131 from the peptide mass).

Here you could see the implementation in topdownr:

.unimod1 <- function(x, s) {
    i <- startsWith(s, x$seq)
    x$mz[i] <- x$mz[i] + 42.010565
    x
}

.unimod4 <- function(x) {
    iCU <- grep("C|U", x$seq)
    x$mz[iCU] <- x$mz[iCU] + 57.021464
    x
}

.unimod765 <- function(x) {
    gsub("^M([ACGPSTV])", "\\1", x)
}

All these rules could be written in regular expressions (one for finding the pattern and in some cases a second one for replacement). But there is no way for me to write nrow(modifications) == 3458 regular expressions.

Unfortunately sometimes the rule could not be predicted from the Site and Position column because the details are written in special notes, e.g. for Met-loss:

N-terminal initiator methionine is removed by a methionine aminopeptidase from proteins where the residue following the methionine is Ala, Cys, Gly, Pro, Ser, Thr or Val. This is generally the final N-terminal state for proteins where the following residue was a Cys, Pro or Val.

But often the notes just contain less useful information (that's why the notes are not included in the data.frame):

"observed in monoclonal antibodies" "Covalently bound structure in Manglik et al., Fig. 1b-Fig1c. Chemical formula % Sigma catalog entry." "Triton X-114" "GEE (glycine ethyl ester) is a substrate for the enzyme Factor XIII for cross-linking to fibrinogen" ...

I would like to have an interface like calculatePeptideMass(peptideSequence, fixedModifications=unimodIds, variableModifications=unimodIds, neutralLoss=TRUE) that could be used in MSnbase::calculateFragements and similar functions. (Would be great to have impact from mass spec users here for a better interface regarding the fixed/variable modifications.)

What we could do: Writing the regular expressions for 3-10 often used modifications and set everything else as NA. If somebody wants to use a modification with NA pattern he would get a message to open an issue on github for implementing this rule.

Alternatively we just provide the modification data.frame with the delta mass (and remove the other columns) and let the user implement the rule himself (as I did in topdownr). This would at least reduce the need for hardcoding the delta mass.

lgatto commented 6 years ago

What we could do: Writing the regular expressions for 3-10 often used modifications and set everything else as NA. If somebody wants to use a modification with NA pattern he would get a message to open an issue on github for implementing this rule.

I like this approach because it makes the package useful for what you need right now without overwhelming you with tons of unnecessary stuff, but allows users to extend or asks for useful extensions.

sgibb commented 6 years ago

I implemented the first prototype of a function to calculate the mass for peptides and allow fixed custom and unimod modifications (the unimod modifications are used by their short names, colon, site):

library("unimod")

unimod:::.mass("MACE",
               fixedModifications=c("Acetyl:N-term",
                                    "Carbamidomethyl:C"))
# [1] 533.1614
# attr(,"sequence")
# [1] "MACE"

unimod:::.mass("MACE",
               fixedModifications=c("Met-loss:P-M",
                                    "Acetyl:N-term",
                                    "Carbamidomethyl:C"))
# [1] 402.1209
# attr(,"sequence")
# [1] "ACE"

unimod:::.mass(c("ACE", "MACE", "CDE"),
               fixedModifications=c("Met-loss:P-M",
                                    "Acetyl:N-term",
                                    "Carbamidomethyl:C"))
# [1] 402.1209 402.1209 446.1107
# attr(,"sequence")
# [1] "ACE" "ACE" "CDE"

unimod:::.mass(c("ACE", "MACE", "CDE"), fixedModifications="Unknown:420:N-term")
# [1] 723.1397 854.1802 767.1296
# attr(,"sequence")
# [1] "ACE"  "MACE" "CDE"
#
# Applying the default rule for the modification: Unknown:420:N-term
# Please create an issue on: https://github.com/ComputationalProteomicsUnit/unimod/issues/new
# to let us implement the correct rule or if the default one is already correct we could remove 
# this message.

unimod:::.mass(c("ACE", "MACE", "CDE"),
               fixedModifications=data.frame(
                    Id=c("MyModification1",
                         "MyModification2"),
                    Site=c("C", "D"),
                    MonoMass=c(57, 58),
                    stringsAsFactors=FALSE))
# [1] 360.0889 491.1294 462.0787
# attr(,"sequence")
# [1] "ACE"  "MACE" "CDE"

I am going to implement the variable modifications next. @pavel-shliaha, @adder, @yafeng any suggestion for the interface?

Currently I am thinking an additional argument named variableModifications that takes a data.frame with the columns Id, Site (Aminoacid), Location (Position in the peptide chain), DeltaMass would be sufficient.

Does anyone have a good suggestion for a name for this function? I don't like names that contain more or less useless verbs calculateMass, determineMass, getMass (I know we have MSnbase::calculateFragments; I am not sure why we not simply used fragments that time?!). That's why I vote for mass (but this is very generic).

lgatto commented 6 years ago

If you want mass, you might need to consider a method mass,character and use the generic in ProtGenerics. Otherwise, what about pepmass, to get the mass of a peptides (with optional fixed or variable modifications passed as arguments).

sgibb commented 6 years ago

pepmass is much more specific. Thanks. I guess the bioc reviewer will "force" me to provide a method for AAString, AAStringSet and AAStringSetList anyway (they did so for the cleave method in cleaver). So pepmass,character, pepmass,AAString etc. would be fine.

adder commented 6 years ago

Hey, Dataframes sound ok for me. I also usually represent these type of objects as dataframes with in my code, it's sufficiently flexible. It fits nicely with my mainly dplyr/tidyverse oriented workflow :)

Mosty tricky thing is probably specyfing terminal modifiations. Maybe position 0 for N-terminus and length(pep)+1 for C-terminus?

Regarding the function name. If mass is to general, you could also call it peptide_mass.

adder commented 6 years ago

Ok, I was to slow with my comments :) Sorry

lgatto commented 6 years ago

Let's not get into the CamelCam vs snake_case vs alllowsercase debate ;-)

Surely we all agree not to use ALLUPPERCASE.

sgibb commented 6 years ago

I like pepmass because it calculates the mass for peptides (ok it could be a protein as well). I assume we need to create methods because we should support character and AAString*.

I actually wondering what should happen if a variable modification and a fixed modification hit the same site:

both are applied
fixed mod is overwritten (= fixed mod wouldn't be applied but the variable)
the variable mod is ignored

adder commented 6 years ago

I'm not an expert in these matters and I can't supply real biological examples right now but I would say that both should be applied by default. In the case that both can be applied, you allow for this. If the fixed modification blocks the variable modification, it's up to the user to correctly specify the variable modification by not allowing it on the same site that the fixed is on.

A difficult one is the case that the variable blocks a fixed modification, I guess an option variable_only = TRUE could help here. (the default of this option would be FALSE then)

I'm not sure if this is a problem in a real example but what happens if you have 2 variable modifications that can be on the same site?

lgatto commented 6 years ago

I suppose it depends whether the modifications can co-occur or whether they compete. I would say that it is the user's responsibility to make sure that sites undergo only a single modification (whether variable of fixed); if > 1 modifications are provided, I think we should consider all possibilities: mod1 only, mod2 only, mod1 and 2, or none, if both are variable.

I could ask in the lab if this is an issue in practice.

pavel-shliaha commented 6 years ago

I think that 2 modifications can co-exist in principle (chemically), but I have never seen 2 modifications reported on the same residue. I think if you want them both then just create a new modification that contains both of them. Maybe make a function that combines them. And fixed modification should beat variable modification. This is just my opinion of course

The only real example I can think of is trimethylation of lysine. There are 3 modifications.

1) K methyl 14.015650 2) K dimethyl 28.031300 3) K trimethyl 42.046950

the mass of Kme3 modification is exactly identical to Kme2 + Kme1, however I have never seen an identification with 2 modifications Kme2 + Kme1. All modifications Kme3 are reported as Kme3.

pavel-shliaha commented 6 years ago

I agree you should store NL as a column of a dataframe.

pavel-shliaha commented 6 years ago

see below my email exchange with people who work on simultaneous modifications in Mascot. They say mascot does not put 2 modifications on the same reisude

Dear Pavel,

No, Mascot will never suggest two simultaneous modifications on the same residue. In such a case it would try to allocate one of the modifications to a different residue if another possible target is present in the peptide. If you expect to see this, you should specify it as a separate modification.

Best, Tina

From: Pavel V. Shliaha [mailto:pavels@bmb.sdu.dk] Sent: 25. januar 2018 13:42 To: Tina Nybo; Adelina Rogowska-Wrzesinska Subject: 2 modifications on the same residue

Dear Tina and Adelina,

I know you work with some very weird modifications in oxidation field and hence I wanted to ask if you have ever come across residues that could be modified in 2 places, e.g. oxidation + chlorination. If so how do you handle that in a database search. Do you specify modifications separately as dynamic and mascot knows a combination on a single residue is possible or do you create a new modification that contains both the (say chlorination + oxidation?)

Pavel

lgatto commented 6 years ago

So, bottom line is that search engine don't seem to support multiple modification. I suggest that if such a cases arises, to calculate the masses for

mod1 only,
mod2 only,
none if both are variable
mod1 + mode2, but return a message/warning that two modifications appear on the same amino acid.

I don't like the idea that a user has to create a new virtual modification composed of two individual ones. As search engines won't support this, a warning or message should then inform the user.

And to follow up from Pavel's example, trimethylation would be a single modification, of course.

pavel-shliaha commented 6 years ago

@sgibb and @lgatto just a quick opinion from a more top-down perspective

1) @sgibb could you please provide an output you imagine your function will give when you submit a seqeunce and a variable modification.

unimod:::.mass("KKK", varModifications=data.frame( Id=c("acetyl"), Site=c("K"), MonoMass=c(42), stringsAsFactors=FALSE))

will it be a vector of all possible permutations of K modification masses, i.e. singly, doubly and triply acetylated? I can suggest 3 different proteoforms with identical mass KacKK, KKacK and KKKac. How will this be reflected if the output is just monoisotopic mass? (please let me know if you are open to suggestions on these points)

2) As a top-down person can I suggest you create an additional column which by default is all. I sometimes want a fixed/varible modification but not on all K, but only on a particular. E.g. I know first K is acetylated but 2nd one can only be trimethylated.

3) Just to let you know: the vast majority of modifications cannot co-exist with others. The reason for this is simple: a modification needs certain physico-chemical properties to be attached to an amino acid. However other modifications to this residue destroy these properties. Given co-existance is extremely rare I suggest not to not calculate co-existing modifications by default, but perhaps to provide an interface that allows user to say which modifications can co-exist.

Lets assume there are 5 K (KKKKK) residues, each of which can be mono-, di, trimethylated and acetylated. MS1 mass tells us there are 3 methylations and 1 acetylations Even without co-existing modifications we already have a huge space of possibilities of proteoform combinations. E.g. KmeKmeKmeKacK and Kme3KacKKK and so on. If you do consider all can co-exist the number of combinations becomes almost infinite.

sgibb commented 6 years ago

@adder, @lgatto and @pavel-shliaha thanks for your great input and sorry for the delayed answer.

First I have to admit that my understanding of fixed/variable modifications was quite different. So to have everybody on the same page I would define the terms now as follows:

fixed: modification that is always present, could have two characteristics:

all: each residue has the same modification, e.g. KmeKmeKme
specific: just a few residues or a single residue at a specific position was modified and the position is known a priori e.g. methyl at K1: KmeKK, or metyhl at K1 and 2: KmeKmeK

variable: modification could happen at none, one, multiple or all residues without knowing the position a priori.

Currently unimod just supports fixed/all. I am going to implement fixed/specific next.

@sgibb could you please provide an output you imagine your function will give when you submit a seqeunce and a variable modification.
unimod:::.mass("KKK",
varModifications=data.frame(
Id=c("acetyl"),
Site=c("K"),
MonoMass=c(42),
stringsAsFactors=FALSE))

Current output would be:

mass	sequence	modifications
510.3166	KKK	Acetyl:K

(because it is KacKacKac)

will it be a vector of all possible permutations of K modification masses, i.e. singly, doubly and triply acetylated? I can suggest 3 different proteoforms with identical mass KacKK, KKacK and KKKac. How will this be reflected if the output is just monoisotopic mass? (please let me know if you are open to suggestions on these points)

As you assumed currently I just return the monoisotopic mass. So if fixed/specific modifications are available the output would be 426 for KacKK, KKacK and KKKac.

Of course I am open for suggestions and discussions.

As a top-down person can I suggest you create an additional column which by default is all. I sometimes want a fixed/varible modification but not on all K, but only on a particular. E.g. I know first K is acetylated but 2nd one can only be trimethylated.

I think that is what I want to provide with the fixed/specific method.

Just to let you know: the vast majority of modifications cannot co-exist with others. The reason for this is simple: a modification needs certain physico-chemical properties to be attached to an amino acid. However other modifications to this residue destroy these properties. Given co-existance is extremely rare I suggest not to not calculate co-existing modifications by default, but perhaps to provide an interface that allows user to say which modifications can co-exist.

Good suggestion. That would be easier to implement.

Lets assume there are 5 K (KKKKK) residues, each of which can be mono-, di, trimethylated and acetylated. MS1 mass tells us there are 3 methylations and 1 acetylations Even without co-existing modifications we already have a huge space of possibilities of proteoform combinations. E.g. KmeKmeKmeKacK and Kme3KacKKK and so on. If you do consider all can co-exist the number of combinations becomes almost infinite.

I see your point. With the current implementation it doesn't matter because it will only return the monoisotopic mass. But I guess that is not the information you are interested in, or?

rformassspectrometry / unimod

Modification object #6