Open adder opened 8 years ago
Maybe I don't completely understand your request but at least your second suggestion is already possible:
library("MSnbase")
calculateFragments("ACE", modifications=c(C=57.02146))
# Modifications used: C=57.02146
# mz ion type pos z seq
# 1 72.04439 b1 b 1 1 A
# 2 232.07504 b2 b 2 1 AC
# 3 361.11763 b3 b 3 1 ACE
# 4 148.06043 y1 y 1 1 E
# 5 308.09108 y2 y 2 1 CE
# 6 379.12819 y3 y 3 1 ACE
# 7 115.06278 y1_ y_ 1 1 E
# 8 275.09343 y2_ y_ 2 1 CE
# 9 346.13054 y3_ y_ 3 1 ACE
calculateFragments("ACE", modifications=c(C=57.02146, A=-1))
# Modifications used: C=57.02146, A=-1
# mz ion type pos z seq
# 1 71.04439 b1 b 1 1 A
# 2 231.07504 b2 b 2 1 AC
# 3 360.11763 b3 b 3 1 ACE
# 4 148.06043 y1 y 1 1 E
# 5 308.09108 y2 y 2 1 CE
# 6 378.12819 y3 y 3 1 ACE
# 7 115.06278 y1_ y_ 1 1 E
# 8 275.09343 y2_ y_ 2 1 CE
# 9 345.13054 y3_ y_ 3 1 ACE
If I get it correctly, calculateFragments("ACE", modifications=c(C=57.02146, A=-1))
still defines a fixed modification. (So all alanines in the peptide sequence (eg. ACAE) will have a -1 mass shift)
Sometimes you want to define a variable modification (eg. only the second A is modified)
1)
You could this by giving a modification vector with the modification giving at the postion in the vector matching the position in the sequence
calculateFragments("ACAE", fixed.modifications=c(C=57.02146),variable.modifications=c(0,0,-1,0))
2)
the OrgMassSpecR R package solved this by using a lettercode in the peptide sequence and link the lettercode to a modification with list. (see fragmentpeptide() and PeptideSpectrum() functions)
You could this by giving a modification vector with the modification giving at the postion in the vector matching the position in the sequence
eg.
calculateFragments("ACAmE", fixed.modifications=c(C=57.02146),variable.modifications=c(m = -1))
A more biological relevant use case would be phosphorylation. Eg, serine can get phosporylated but not all serines in the protein will be phosporylated.
Say that you have a spectra identified wiht a phosogroup at the 2nd serine in the sequence.
It would be nice to be able to label the spectrum using the identified peptide sequence with calculateFragments()
I hope I'm clear enough this time :)
Ok, thanks for the clarification. Great idea. I like your second solution more than the first. For the first you have to know the length of the peptide sequence before (or count it manually).
@lgatto we changed the way the modification
argument was handled in MSnbase => 1.17.6
. Currently the modification
is added to the amino acid (before the modification replaced the amino acid).
In favor to handle the fixed.modifications
and variable.modifications
in the same way I would prefer to replace the original mass with the modification
(I would use a single argument modification
with upper letters for general/fixed amino acid modifications and lower letters for individual/variable modifications). I know this changes are not user-friendly. Or do you have another suggestion? I am sorry for breaking the API so often.
Sry, we have to rethink the handling of modification once again. We changed to add instead of replace because we allow modifications to Cterm
and Nterm
: https://github.com/lgatto/MSnbase/issues/47#issuecomment-113902500
Nterm
/Cterm
add would be useful (we don't know before which amino acid is at the N/C term, so replace would not be an option)p
is the phosphorylated serine and have to calculate S + p
which would require some preprocessing by the user)MSnbase 1.17.6
; see comment above)Hi, Letters in the sequence to define modification are fine be me. :) Maybe some extra comments: 1) allow multi letter codes (so we are not restricted by the 24 letters in the alphabet to define PTMs) eg, SphosAMPSoxE to define phospo on S and oxidation on second S
2) Not sure what you mean by adding or replacing the original mass with modification mass. But it might be useful to know that modification masses are usually added to original masses. eg the modification defined in the unimod database are defined as mass differences. Not sure if it make a big difference on your code internally
3) Maybe something to consider, nterm en cterm modification might also be variable. If seen following notation in other software. (not sure if there is 'official nomenclature' for this) no modification H-SAMPLE-OH nterm acetylation ace-SAMPLE-OH But since you only have 1 nterm or cterm in peptide sequence, putting it as a fixed modification would work.
4) Finally, since you can work with mzid format in MSnBase. MZidentML also has an specific way to handle modified peptide sequence as defined in MZidentML guidelines. Actually you could use a PSM entry in the mzid format to calculate the fragments and label a spectra. But this is probably something less straightforward to add to the current functions short-term :)
@sgibb - trying to catch up (still travelling, until next week).
A few thoughts
mzIdentML
conventions, as mentioned by @adder in the post above could be a winning strategy in the long run. Something to keep in mind for the future, especially as calculateFragments
seems to be a much used function, based on the feature requests we have.fixed.modifications
and variable.modifications
? Or one that defines which mods are variable (fixed = TRUE
would be the default, and a user would set something like fixed = c(TRUE, FALSE, TRUE, FALSE
).calculateFragments2
function, with a new API that supports everything we need, without worrying about backwards compatibility. When that one becomes stable, we can start a formal deprecation/replacement process. @lgatto A reply on your 2nd comment.
Or one that defines which mods are variable (fixed = TRUE would be the default, and a user would set something like fixed = c(TRUE, FALSE, TRUE, FALSE).
You also have to be able to specify where the variable modificiation is positioned. Eg. if you have a sequence with 2 serines. You can have a Phospo group on the first but not on the second serine.
You also have to be able to specify where the variable modificiation is positioned. Eg. if you have a sequence with 2 serines. You can have a Phospo group on the first but not on the second serine.
Yes, of course. My point here is to handle variable and fixed modifications independently, rather than tangle them together using, for example upper and lower case. These two parameters could be could be lists with positions and masses, ... or anything that makes sense/is convenient in terms of user interface and coding.
@sgibb - what about a Modification
class with slots aa
, position
, mass
, ... We could then define a list of Modification
(or a Modifications
class) the user could choose from, or create their own, and then pass them to the calculateFragments
function. I don't want us to over-engineer this, but it might be a reasonable possibility.
Adding a new class would be surely the cleanest solution and even support unimod would be great but is that what the user really wants?
The mzIdentMl code is very similar to your class suggestion:
<Modification location="10" residues="K" monoisotopicMassDelta="138.0680796">
<cvParam accession="XLMOD:02001" cvRef="XLMOD" name="DSS"/>
<cvParam accession="MS:1002509" cvRef="PSI-MS" name="cross-link donor" value="11309529182388590588"/>
</Modification>
Would be something like this pseudocode be more useable?
mod <- Modification(residues="C", location=2, avgMassDelta=57.02146)
calculateFragments("ACE", modifications=mod)
Or for the to be written unimod package:
library(unimod)
m <- unimod::query(id=4)
calculateFragments("ACE", modifications=mod)
Sorry for being so slow and quiet regarding this topic. I think the best way would be to provide a Modification
class (with unimod support). That's why I started (the long planned) unimod package. @adder and @lgatto you are welcome to contribute.
The planned class: https://github.com/ComputationalProteomicsUnit/unimod/blob/master/R/AllClasses.R (nothing implemented yet).
~BTW: @lgatto I wanted to develop it under the umbrella of the CPU but I don't have write access to the CPU/unimod repository. So I had to fork it. We could change this at any time.~
EDIT: urls updated to CPU
Excellent initiative! I have now given you admin rights to the CPU repo.
upvote +1
I am also interested to know if this pseudocode has been implemented already?
mod <- Modification(residues="C", location=2, avgMassDelta=57.02146) calculateFragments("ACE", modifications=mod)
It can be very useful to define variable modifications when calculating fragment ions.
For those people who wants to predict MS2 fragments with variable modifications, there is a temporary solution here. https://www.github.com/yafeng/SpectrumAI/
the function predict_MS2_spectrum
is written in Spectra_functions.R
script.
it works like this:
fragments = predict_ms2_spectrum("+229.163EGNNPAEN+0.985GDAK+229.163R", product_ion_charge = 1)
"+229.163EGNNPAEN+0.985GDAK+229.163R"
is the Peptide sequence output from MSGFplus.
It only generates b, y ions (no neutral loss, up to charge state 2).
@yafeng Unfortunately I had no time to implement it yet. I have to fix a few issues in MSnbase
before. Next I would focus on unimod to provide a unique interface for modifications. Subsequently I am going to work on this issue. If you have interest in the unimod
package help is very welcome.
Hi there, what is the current status of this issue? I recently had to implement mirror plotting of peptide spectra with variable modifications. E.g. only one out of several methionines oxidised. Is there any help needed?
Unfortunately I have much to do with my real job. We started to implement the unimod modifications in the unimod package. Currently it can download and parse the data from https://unimod.org and can calculate the mass of given sequences with fixed modifications.
We would need this functionality in more than one package (MSnbase
, Pbase
, topdownr
, ...). So your help would be very welcome. Please don't hesitate to create issues in the unimod
repository if you want to help and/or need some explanation what we did.
Hi, It seems you can only define fixed modifications with calculateFragments(). Am I right? If not, how to do it? I think a feature like this could be useful. Posible ways of implementing this are: 1) Provide a vector of masses/modifications with length = peptide sequence length. The position in the vector corresponds to the position in the peptide sequence
2) use lettercode in the peptide sequence and link the lettercode to a modification wit a names list. (much like fragmentpeptide() and PeptideSpectrum() in the OrgMassSpecR R package)
What do you guys think?