While processing your dataset, I encountered a couple of inconsistencies between the dataset and the overview of the PTMs in the README.
There are 1746 peptides containing an acetylation on the N-terminal methionine (e.g. M[+42.010565]APGQLALFSVSDK) which is not possible to my knowledge. There is also 1 peptide containing an acetylation on the N-terminal Glutamine (Q[+42.010565]PAAPP).
Is this a different modification on the sidechain of M and Q or is this an N-terminal acetylation instead?
A second issue is that Cyclized S-CAM-Cys [+39.994915] is not in the dataset. Instead, C[-17.0] is given. This is the mass shift due to the cyclization (loss of ammonia) of carbamidomethylated cysteine, while +39.994915 is the mass difference compared to cysteine.
Finally, there is only phosphorylation of Serine (S) in the dataset and not of Threonine (T) and Tyrosine (Y) as was given in the README. Are these expected to be missing?
Some more small inconsistencies:
The dataset uses [+15.994915] for oxidation instead of [+15.99491]
The dataset uses [+100.016044] for succinylation instead of [+101.023869]
The dataset uses [+14.01565] for mono-methylation instead of [+14.015650]
The dataset uses [+28.0313] for di-methylation instead of [+28.031300]
The dataset uses [+42.04695] for tri-methylation instead of [+42.046950]
The dataset uses [-18.0] for pyroglutamate on N-term E instead of [-18.010565]
The dataset uses both [+57.021464] and [+57.0214635] for carbamidomethylation
While processing your dataset, I encountered a couple of inconsistencies between the dataset and the overview of the PTMs in the README.
There are 1746 peptides containing an acetylation on the N-terminal methionine (e.g. M[+42.010565]APGQLALFSVSDK) which is not possible to my knowledge. There is also 1 peptide containing an acetylation on the N-terminal Glutamine (Q[+42.010565]PAAPP). Is this a different modification on the sidechain of M and Q or is this an N-terminal acetylation instead?
A second issue is that Cyclized S-CAM-Cys [+39.994915] is not in the dataset. Instead, C[-17.0] is given. This is the mass shift due to the cyclization (loss of ammonia) of carbamidomethylated cysteine, while +39.994915 is the mass difference compared to cysteine.
Finally, there is only phosphorylation of Serine (S) in the dataset and not of Threonine (T) and Tyrosine (Y) as was given in the README. Are these expected to be missing?
Some more small inconsistencies: