Open levitsky opened 2 weeks ago
There was a similar discussion with ProForma recently. The N- and or C-terminal losses are baked into many if not all of the terminal modifications in Unimod, but not for all modification definitions. This change appears to apply to the N-terminus, where you have -H, does the C terminal -OH need to be handled as well?
Since this doesn't alter behavior for programs that worked prior to the change barring abstract kwargs
propagation, it adds new behavior, which isn't too dangerous. A little bit of testing suggests that terminal formulae won't have issues with trailing or leading -
symbols either.
I think there should be a warnings.warn
call when the implicit correction is applied, which should tell the user that their input is being altered so they know to specify the modification correctly in the future, and they can use the warnings filtering tools if they decide they need that auto-correction and don't want to see the warning anymore.
Thank you for chipping in @mobiusklein!
This change appears to apply to the N-terminus, where you have -H, does the C terminal -OH need to be handled as well?
If my understanding is correct, this would depend on the Unimod logic regarding this modification's composition and mass, not on where exactly in the sequence the user is applying the modification. Conceivably, if a modification is strictly C-terminal, it would have "OH" subtracted rather than "H". If it's annotated as a side chain mod and the user applies it as C-terminal, though, the correction to apply is still "H". Does that make sense?
If it does, I should look at Unimod and try to understand if some mods there are C-terminal and require "OH" correction. We would not have access to this metadata in Composition
constructor anyway, so we could just guess based on where the mod is applied, but that makes this whole idea way riskier.
After trying to look for C-terminal mods in Unimod, I have not found examples with -OH subtracted (which probably only means my search was weird), but I have seen enough evidence that my generalization may not be useful. As a matter of fact, applying just about anything from Unimod as "terminal group" instead of just a regular mod on a terminal residue is risky, and there is little we can do to fix it, other than change how the composition calculations work (always add H- and -OH on top). A warning is definitely justified when trying to use normal mod labels in terminal context, or in fact we could just as well raise an exception. None of the two would have helped with the OP's original issue, though, as they specifically assigned the composition for a terminal acetyl group to be that listed in Unimod, and intercepting that would be tricky.
I rolled back item 2 in the proposed solution, trying to do this now raises a PyteomicsError
. The exception will later have a URL to the notice in the docs about the difference between terminal groups and mod labels (after the updated doc is deployed).
Also, @mobiusklein apparently numpy 2.0 is now released and pynumpress
doesn't import with it. Should it be addressed in pynumpress
?
I'll fix pynumpress
, I'm guessing all the libraries that depend upon it at build/runtime are going to break similarly.
This PR is in response to user feedback on the mailing list: https://groups.google.com/g/pyteomics/c/X__Vjy_d6r8
The problem
aa_comp
from Unimod as a terminal modification, which produces incorrect compositions and masses, because terminal modifications should actually be specified by full group composition, while normal side-chain modifications are reduced by a hydrogen (as they represent a difference in compositions).The proposed solution
aa_comp
.fast_mass2
, but now also works forComposition
andcalculate_mass
.Potential concerns Implicitly adding a hydrogen does not make sense for all modifications on Unimod (not all of them are even modifications). But then, it probably doesn't make sense to specify them as terminal, either.
Can we shoot ourselves in the foot by applying this implicit correction?