rformassspectrometry / unimod

Amino acid modifications for mass spectrometry
6 stars 0 forks source link

Modification object #6

Open sgibb opened 7 years ago

sgibb commented 7 years ago

The current Modification object is a simple class that contains mainly the unimod ID, the composition and the avg/mono mass of the modification as integer/double vectors of length 1. Additionally it contains a data.frame named specificity that stores information about the site and the position of the modification, e.g.:

- General:
  Class                  :   Modification
  Accession number/id    :              1
  PSI-MS/Interim Name    :         Acetyl
  Description            :    Acetylation
  Composition            : H(2) C(2) O(1)
  Delta Average Mass     :        42.0367
  Delta Monoisotopic Mass:      42.010565
  Approved               :           TRUE
- Specificity:
    site       position      classification hidden group
1      K       Anywhere            Multiple  FALSE     1
2 N-term     Any N-term            Multiple  FALSE     2
3      C       Anywhere  Post-translational   TRUE     3
4      S       Anywhere  Post-translational   TRUE     4
5 N-term Protein N-term  Post-translational  FALSE     5
6      T       Anywhere  Post-translational   TRUE     6
7      Y       Anywhere Chemical derivative   TRUE     7
8      H       Anywhere Chemical derivative   TRUE     8
- References: use 'references(object)'

Some specificities have additional entries in the unimod database for neutral loss (#3). These entries have their own avg/mono mass (sometimes different from the general modification mass, e.g. Phosphorylation, id=21).

screenshot_20170711_220453

We could create a new class NeutralLoss that stores these information and could be attached to a specificity (which maybe should be also a class, so that we could handle different user-defined locations easier; see #2). But before creating two new classes I like to ask whether anyone has a better idea? Maybe we overcomplicate things. Maybe a data.frame (with some duplicated entries in some columns) would fit and a complicated class hierarchy is just overkill.

Class hierarchy would be:

AbstractModification (VIRTUAL; slots id, name, avgMass, monoMass, composition)
|- NeutralLoss (inherits AbstractModification, no additional slots)
`- Modification (inherits AbstractModification, 
                 additional slots: specificity (list of Specificity)

Specificity (slots: id, site, position, classification, hidden, 
                    neutralLoss (list of NeutralLoss))

vs. a data.frame where all these slots would be columns.

There is the mzID package that has a complex class hierarchy and many classes but in fact just turns a mzIdentML file into a data.frame (nearly identical use case). I don't want to create classes just because it is possible. The user should benefit from them and should be allowed to create modifications for calculateFragments and other functions.

@lgatto do you have a better idea for the data structure?

lgatto commented 7 years ago

I think it depends a bit on the use cases you envision. I am not convinced that several classes are really necessary here, and a data.frame with additional rows, similarly to the table above, seems like it could do the trick.

There could be a helper function to create new modification as new rows in the data.frame, so that the user doesn't need to create them manually - that function could do some checks, fill out values that can be calculated automatically, ...

sgibb commented 7 years ago

Maybe you are right and we should not overcomplicate things. We could avoid a class completely.

I converted all the unimod entries into a data.frame (currently without the NeutralLoss and reference information) and beside a lot of data duplication it takes just around 1 MB of memory:

head(d)
#     id   name description        lastModified approved avgMass  monoMass
# 1    1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.1  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.2  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.3  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.4  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
# 1.5  1 Acetyl Acetylation 2008-02-15 05:20:02        1 42.0367 42.010565
#      composition   site       position     classification hidden group
# 1   H(2)C(2)O(1)      K       Anywhere           Multiple  FALSE     1
# 1.1 H(2)C(2)O(1) N-term     Any N-term           Multiple  FALSE     2
# 1.2 H(2)C(2)O(1)      C       Anywhere Post-translational   TRUE     3
# 1.3 H(2)C(2)O(1)      S       Anywhere Post-translational   TRUE     4
# 1.4 H(2)C(2)O(1) N-term Protein N-term Post-translational  FALSE     5
# 1.5 H(2)C(2)O(1)      T       Anywhere Post-translational   TRUE     6
> dim(d)
# [1] 2370   13
> print(object.size(d), units="Kb")
# 908.1 Kb

By converting some of the columns into factor and Rle, removing some useless columns (lastModification, group) and adding around 1000 rows because of incorporating NeutralLoss information that would change a bit. Nevertheless I think it would be acceptable to store the whole unimod.xml in a data.frame (and keep it with the amino acid and element information in data). In that case we could also move xml2 from Depends to Suggests. The reference information is IMHO negligible. If anybody wants to know where a modification was described/published he could look it up at http://unimod.org.

Instead of a Modification class there could be a simple function that creates a modification data.frame for calculateFragments, etc. This function could look up the unimod information in the unimod data.frame.

lgatto commented 7 years ago

I think it's good to keep things as simple as possible, at least in a first stage. If necessary, it's possible to encapsulate the data in a class of the need becomes clear.

sgibb commented 6 years ago

There are three data.frames in the /data directory now (containing all information from uniprot except the references and notes):

library("unimod")

data("elements")
head(elements)
Name FullName AvgMass MonoMass
H H Hydrogen 1.007940 1.007825
2H 2H Deuterium 2.014102 2.014102
Li Li Lithium 6.941000 7.016003
C C Carbon 12.010700 12.000000
13C 13C Carbon13 13.003355 13.003355
N N Nitrogen 14.006700 14.003074
data("aminoacids")
head(aminoacids)
OneLetter ThreeLetter FullName AvgMass MonoMass H C N O S Se
- - 0.0000 0.00000 0 0 0 0 0 0
A A Ala Alanine 71.0779 71.03711 5 3 1 1 0 0
R R Arg Arginine 156.1857 156.10111 12 6 4 1 0 0
N N Asn Asparagine 114.1026 114.04293 6 4 2 2 0 0
D D Asp Aspartic acid 115.0874 115.02694 5 4 1 3 0 0
C C Cys Cysteine 103.1429 103.00919 5 3 1 1 1 0
data("modifications")
head(modifications)
Id Name Description Composition AvgMass MonoMass Site Position Classification SpecGroup LastModified Approved Hidden
1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 K Anywhere Multiple 1 2017-11-08 16:08:56 TRUE FALSE
1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 N-term Any N-term Multiple 2 2017-11-08 16:08:56 TRUE FALSE
1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 C Anywhere Post-translational 3 2017-11-08 16:08:56 TRUE TRUE
1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 S Anywhere Post-translational 4 2017-11-08 16:08:56 TRUE TRUE
1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 N-term Protein N-term Post-translational 5 2017-11-08 16:08:56 TRUE FALSE
1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 T Anywhere Post-translational 6 2017-11-08 16:08:56 TRUE TRUE

We could turn each of these data.frames into a DataFrame (with Rle or similar) or into a tibble but I don't think it is necessary because the whole database is small:

print(object.size(modifications), units="KB")
# 725.4 Kb

Currently unimod is a very small package providing just these 3 data.frames and has no dependencies (the hidden functions to create the data.frames need the xml2 package that's why it is in Suggests:).

The aminoacids and elements data.frame could replace MSnbase's amino.acids data.frame and atomic.mass vector (in R/environments.R).

Do you like these data.frames and their format or should we provide something different?

lgatto commented 6 years ago

For anything that is like a data.frame, tidy tools are superior when it comes to data wrangling. Still, I don't think we need to depend on tibble, as the conversion can be done by the user, if required. Unless, of course, we envision some sort or direct analysis ourselves where tibbles would be a better fit.

Yes, I would suggest to use the data in MSnbase and make use of unimod. The latter would probably have to be submitted to Bioconductor first, though.

sgibb commented 6 years ago

While the elements and aminoacids data.frames are very useful now. The modification data.frame is more or less useless. E.g. in topdownr we support 3 modifications (Carbamidomethyl, Acetyl, Met-loss; unimod id 4, 1, 765).

library("unimod")
data("modifications")
subset(modifications, Id %in% c(1, 4, 765) & Classification != "Artefact")
Id Name Description Composition AvgMass MonoMass Site Position Classification SpecGroup LastModified Approved Hidden
1 1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 K Anywhere Multiple 1 2017-11-08 16:08:56 TRUE FALSE
2 1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 N-term Any N-term Multiple 2 2017-11-08 16:08:56 TRUE FALSE
3 1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 C Anywhere Post-translational 3 2017-11-08 16:08:56 TRUE TRUE
4 1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 S Anywhere Post-translational 4 2017-11-08 16:08:56 TRUE TRUE
5 1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 N-term Protein N-term Post-translational 5 2017-11-08 16:08:56 TRUE FALSE
6 1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 T Anywhere Post-translational 6 2017-11-08 16:08:56 TRUE TRUE
7 1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 Y Anywhere Chemical derivative 7 2017-11-08 16:08:56 TRUE TRUE
8 1 Acetyl Acetylation H(2) C(2) O 42.0367 42.01056 H Anywhere Chemical derivative 8 2017-11-08 16:08:56 TRUE TRUE
14 4 Carbamidomethyl Iodoacetamide derivative H(3) C(2) N O 57.0513 57.02146 C Anywhere Chemical derivative 1 2017-10-09 10:27:10 TRUE FALSE
23 4 Carbamidomethyl Iodoacetamide derivative H(3) C(2) N O 57.0513 57.02146 U Anywhere Chemical derivative 10 2017-10-09 10:27:10 TRUE TRUE
24 4 Carbamidomethyl Iodoacetamide derivative H(3) C(2) N O 57.0513 57.02146 M Anywhere Chemical derivative 11 2017-10-09 10:27:10 TRUE TRUE
25 4 Carbamidomethyl Iodoacetamide derivative H(7) C(3) N O S 105.1588 105.02483 M Anywhere Chemical derivative 11 2017-10-09 10:27:10 TRUE TRUE
1170 765 Met-loss Removal of initiator methionine from protein N-terminus H(-9) C(-5) N(-1) O(-1) S(-1) -131.1961 -131.04048 M Protein N-term Co-translational 1 2007-07-15 20:01:35 FALSE TRUE

While we could use the modifications data.frame to find the modification and its mass difference we also need the "modification rule", e.g. for Acetylation the "Any N-term" rule (so we add the mass if our fragment starts with the N-terminal end of the sequence), for Carbamidomethyl the "C|U" rule (we add the mass if "C" or "U" is present in the sequence) and for Met-loss the "remove M at the beginning" rule (we remove "M" from the start of the sequence and substract 131 from the peptide mass).

Here you could see the implementation in topdownr:

.unimod1 <- function(x, s) {
    i <- startsWith(s, x$seq)
    x$mz[i] <- x$mz[i] + 42.010565
    x
}

.unimod4 <- function(x) {
    iCU <- grep("C|U", x$seq)
    x$mz[iCU] <- x$mz[iCU] + 57.021464
    x
}

.unimod765 <- function(x) {
    gsub("^M([ACGPSTV])", "\\1", x)
}

All these rules could be written in regular expressions (one for finding the pattern and in some cases a second one for replacement). But there is no way for me to write nrow(modifications) == 3458 regular expressions.

Unfortunately sometimes the rule could not be predicted from the Site and Position column because the details are written in special notes, e.g. for Met-loss:

N-terminal initiator methionine is removed by a methionine aminopeptidase from proteins where the residue following the methionine is Ala, Cys, Gly, Pro, Ser, Thr or Val. This is generally the final N-terminal state for proteins where the following residue was a Cys, Pro or Val.

But often the notes just contain less useful information (that's why the notes are not included in the data.frame):

"observed in monoclonal antibodies" "Covalently bound structure in Manglik et al., Fig. 1b-Fig1c. Chemical formula % Sigma catalog entry." "Triton X-114" "GEE (glycine ethyl ester) is a substrate for the enzyme Factor XIII for cross-linking to fibrinogen" ...

I would like to have an interface like calculatePeptideMass(peptideSequence, fixedModifications=unimodIds, variableModifications=unimodIds, neutralLoss=TRUE) that could be used in MSnbase::calculateFragements and similar functions. (Would be great to have impact from mass spec users here for a better interface regarding the fixed/variable modifications.)

What we could do: Writing the regular expressions for 3-10 often used modifications and set everything else as NA. If somebody wants to use a modification with NA pattern he would get a message to open an issue on github for implementing this rule.

Alternatively we just provide the modification data.frame with the delta mass (and remove the other columns) and let the user implement the rule himself (as I did in topdownr). This would at least reduce the need for hardcoding the delta mass.

lgatto commented 6 years ago

What we could do: Writing the regular expressions for 3-10 often used modifications and set everything else as NA. If somebody wants to use a modification with NA pattern he would get a message to open an issue on github for implementing this rule.

I like this approach because it makes the package useful for what you need right now without overwhelming you with tons of unnecessary stuff, but allows users to extend or asks for useful extensions.

sgibb commented 6 years ago

I implemented the first prototype of a function to calculate the mass for peptides and allow fixed custom and unimod modifications (the unimod modifications are used by their short names, colon, site):

library("unimod")

unimod:::.mass("MACE",
               fixedModifications=c("Acetyl:N-term",
                                    "Carbamidomethyl:C"))
# [1] 533.1614
# attr(,"sequence")
# [1] "MACE"

unimod:::.mass("MACE",
               fixedModifications=c("Met-loss:P-M",
                                    "Acetyl:N-term",
                                    "Carbamidomethyl:C"))
# [1] 402.1209
# attr(,"sequence")
# [1] "ACE"

unimod:::.mass(c("ACE", "MACE", "CDE"),
               fixedModifications=c("Met-loss:P-M",
                                    "Acetyl:N-term",
                                    "Carbamidomethyl:C"))
# [1] 402.1209 402.1209 446.1107
# attr(,"sequence")
# [1] "ACE" "ACE" "CDE"

unimod:::.mass(c("ACE", "MACE", "CDE"), fixedModifications="Unknown:420:N-term")
# [1] 723.1397 854.1802 767.1296
# attr(,"sequence")
# [1] "ACE"  "MACE" "CDE"
#
# Applying the default rule for the modification: Unknown:420:N-term
# Please create an issue on: https://github.com/ComputationalProteomicsUnit/unimod/issues/new
# to let us implement the correct rule or if the default one is already correct we could remove 
# this message.

unimod:::.mass(c("ACE", "MACE", "CDE"),
               fixedModifications=data.frame(
                    Id=c("MyModification1",
                         "MyModification2"),
                    Site=c("C", "D"),
                    MonoMass=c(57, 58),
                    stringsAsFactors=FALSE))
# [1] 360.0889 491.1294 462.0787
# attr(,"sequence")
# [1] "ACE"  "MACE" "CDE"

I am going to implement the variable modifications next. @pavel-shliaha, @adder, @yafeng any suggestion for the interface?

Currently I am thinking an additional argument named variableModifications that takes a data.frame with the columns Id, Site (Aminoacid), Location (Position in the peptide chain), DeltaMass would be sufficient.

Does anyone have a good suggestion for a name for this function? I don't like names that contain more or less useless verbs calculateMass, determineMass, getMass (I know we have MSnbase::calculateFragments; I am not sure why we not simply used fragments that time?!). That's why I vote for mass (but this is very generic).

lgatto commented 6 years ago

If you want mass, you might need to consider a method mass,character and use the generic in ProtGenerics. Otherwise, what about pepmass, to get the mass of a peptides (with optional fixed or variable modifications passed as arguments).

sgibb commented 6 years ago

pepmass is much more specific. Thanks. I guess the bioc reviewer will "force" me to provide a method for AAString, AAStringSet and AAStringSetList anyway (they did so for the cleave method in cleaver). So pepmass,character, pepmass,AAString etc. would be fine.

adder commented 6 years ago

Hey, Dataframes sound ok for me. I also usually represent these type of objects as dataframes with in my code, it's sufficiently flexible. It fits nicely with my mainly dplyr/tidyverse oriented workflow :)

Mosty tricky thing is probably specyfing terminal modifiations. Maybe position 0 for N-terminus and length(pep)+1 for C-terminus?

Regarding the function name. If mass is to general, you could also call it peptide_mass.

adder commented 6 years ago

Ok, I was to slow with my comments :) Sorry

lgatto commented 6 years ago

Let's not get into the CamelCam vs snake_case vs alllowsercase debate ;-)

Surely we all agree not to use ALLUPPERCASE.

sgibb commented 6 years ago

I like pepmass because it calculates the mass for peptides (ok it could be a protein as well). I assume we need to create methods because we should support character and AAString*.

I actually wondering what should happen if a variable modification and a fixed modification hit the same site:

  1. both are applied
  2. fixed mod is overwritten (= fixed mod wouldn't be applied but the variable)
  3. the variable mod is ignored
adder commented 6 years ago

I'm not an expert in these matters and I can't supply real biological examples right now but I would say that both should be applied by default. In the case that both can be applied, you allow for this. If the fixed modification blocks the variable modification, it's up to the user to correctly specify the variable modification by not allowing it on the same site that the fixed is on.

A difficult one is the case that the variable blocks a fixed modification, I guess an option variable_only = TRUE could help here. (the default of this option would be FALSE then)

I'm not sure if this is a problem in a real example but what happens if you have 2 variable modifications that can be on the same site?

lgatto commented 6 years ago

I suppose it depends whether the modifications can co-occur or whether they compete. I would say that it is the user's responsibility to make sure that sites undergo only a single modification (whether variable of fixed); if > 1 modifications are provided, I think we should consider all possibilities: mod1 only, mod2 only, mod1 and 2, or none, if both are variable.

I could ask in the lab if this is an issue in practice.

pavel-shliaha commented 6 years ago

I think that 2 modifications can co-exist in principle (chemically), but I have never seen 2 modifications reported on the same residue. I think if you want them both then just create a new modification that contains both of them. Maybe make a function that combines them. And fixed modification should beat variable modification. This is just my opinion of course

The only real example I can think of is trimethylation of lysine. There are 3 modifications.

1) K methyl 14.015650 2) K dimethyl 28.031300 3) K trimethyl 42.046950

the mass of Kme3 modification is exactly identical to Kme2 + Kme1, however I have never seen an identification with 2 modifications Kme2 + Kme1. All modifications Kme3 are reported as Kme3.

pavel-shliaha commented 6 years ago

I agree you should store NL as a column of a dataframe.

pavel-shliaha commented 6 years ago

see below my email exchange with people who work on simultaneous modifications in Mascot. They say mascot does not put 2 modifications on the same reisude


Dear Pavel,

No, Mascot will never suggest two simultaneous modifications on the same residue. In such a case it would try to allocate one of the modifications to a different residue if another possible target is present in the peptide. If you expect to see this, you should specify it as a separate modification.

Best, Tina


From: Pavel V. Shliaha [mailto:pavels@bmb.sdu.dk] Sent: 25. januar 2018 13:42 To: Tina Nybo; Adelina Rogowska-Wrzesinska Subject: 2 modifications on the same residue

Dear Tina and Adelina,

I know you work with some very weird modifications in oxidation field and hence I wanted to ask if you have ever come across residues that could be modified in 2 places, e.g. oxidation + chlorination. If so how do you handle that in a database search. Do you specify modifications separately as dynamic and mascot knows a combination on a single residue is possible or do you create a new modification that contains both the (say chlorination + oxidation?)

Pavel

lgatto commented 6 years ago

So, bottom line is that search engine don't seem to support multiple modification. I suggest that if such a cases arises, to calculate the masses for

I don't like the idea that a user has to create a new virtual modification composed of two individual ones. As search engines won't support this, a warning or message should then inform the user.

And to follow up from Pavel's example, trimethylation would be a single modification, of course.

pavel-shliaha commented 6 years ago

@sgibb and @lgatto just a quick opinion from a more top-down perspective

1) @sgibb could you please provide an output you imagine your function will give when you submit a seqeunce and a variable modification.

unimod:::.mass("KKK", varModifications=data.frame( Id=c("acetyl"), Site=c("K"), MonoMass=c(42), stringsAsFactors=FALSE))

will it be a vector of all possible permutations of K modification masses, i.e. singly, doubly and triply acetylated? I can suggest 3 different proteoforms with identical mass KacKK, KKacK and KKKac. How will this be reflected if the output is just monoisotopic mass? (please let me know if you are open to suggestions on these points)

2) As a top-down person can I suggest you create an additional column which by default is all. I sometimes want a fixed/varible modification but not on all K, but only on a particular. E.g. I know first K is acetylated but 2nd one can only be trimethylated.

3) Just to let you know: the vast majority of modifications cannot co-exist with others. The reason for this is simple: a modification needs certain physico-chemical properties to be attached to an amino acid. However other modifications to this residue destroy these properties. Given co-existance is extremely rare I suggest not to not calculate co-existing modifications by default, but perhaps to provide an interface that allows user to say which modifications can co-exist.

Lets assume there are 5 K (KKKKK) residues, each of which can be mono-, di, trimethylated and acetylated. MS1 mass tells us there are 3 methylations and 1 acetylations Even without co-existing modifications we already have a huge space of possibilities of proteoform combinations. E.g. KmeKmeKmeKacK and Kme3KacKKK and so on. If you do consider all can co-exist the number of combinations becomes almost infinite.

sgibb commented 6 years ago

@adder, @lgatto and @pavel-shliaha thanks for your great input and sorry for the delayed answer.

First I have to admit that my understanding of fixed/variable modifications was quite different. So to have everybody on the same page I would define the terms now as follows:

fixed: modification that is always present, could have two characteristics:

  1. all: each residue has the same modification, e.g. KmeKmeKme
  2. specific: just a few residues or a single residue at a specific position was modified and the position is known a priori e.g. methyl at K1: KmeKK, or metyhl at K1 and 2: KmeKmeK

variable: modification could happen at none, one, multiple or all residues without knowing the position a priori.

Currently unimod just supports fixed/all. I am going to implement fixed/specific next.

@sgibb could you please provide an output you imagine your function will give when you submit a seqeunce and a variable modification.

unimod:::.mass("KKK",
varModifications=data.frame(
Id=c("acetyl"),
Site=c("K"),
MonoMass=c(42),
stringsAsFactors=FALSE))

Current output would be:

mass sequence modifications
510.3166 KKK Acetyl:K

(because it is KacKacKac)

will it be a vector of all possible permutations of K modification masses, i.e. singly, doubly and triply acetylated? I can suggest 3 different proteoforms with identical mass KacKK, KKacK and KKKac. How will this be reflected if the output is just monoisotopic mass? (please let me know if you are open to suggestions on these points)

As you assumed currently I just return the monoisotopic mass. So if fixed/specific modifications are available the output would be 426 for KacKK, KKacK and KKKac.

Of course I am open for suggestions and discussions.

As a top-down person can I suggest you create an additional column which by default is all. I sometimes want a fixed/varible modification but not on all K, but only on a particular. E.g. I know first K is acetylated but 2nd one can only be trimethylated.

I think that is what I want to provide with the fixed/specific method.

Just to let you know: the vast majority of modifications cannot co-exist with others. The reason for this is simple: a modification needs certain physico-chemical properties to be attached to an amino acid. However other modifications to this residue destroy these properties. Given co-existance is extremely rare I suggest not to not calculate co-existing modifications by default, but perhaps to provide an interface that allows user to say which modifications can co-exist.

Good suggestion. That would be easier to implement.

Lets assume there are 5 K (KKKKK) residues, each of which can be mono-, di, trimethylated and acetylated. MS1 mass tells us there are 3 methylations and 1 acetylations Even without co-existing modifications we already have a huge space of possibilities of proteoform combinations. E.g. KmeKmeKmeKacK and Kme3KacKKK and so on. If you do consider all can co-exist the number of combinations becomes almost infinite.

I see your point. With the current implementation it doesn't matter because it will only return the monoisotopic mass. But I guess that is not the information you are interested in, or?