topdownproteomics / sdk

Software solution for common top-down proteomics tasks
http://www.topdownproteomics.org/
MIT License
9 stars 4 forks source link

Add Isotope Support #56

Closed rfellers closed 5 years ago

rfellers commented 6 years ago

Many modifications contain specific isotopes (e.g. 13C) in their atomic compositions. At present, the IChemicalFormula doesn't handle isotopes. I can see a couple ways of supporting this:

  1. Handle the 4 specific Unimod isotopes (http://www.unimod.org/masses.html: 2H, 13C, 15N, 18O) as special element symbols in the element providers. I believe this would cover all the other modification sets (e.g. PSI-MOD) as well, but I'd have to check. This would require the least amount of code changes, but we wouldn't have a generic isotope solution. In fact, I started down this path in UnimodHardCodedAtomProvider.

  2. Add an isotope collection to the chemical formula (similar to the UofW ChemicalFormula). This expands /complicates the implementation and could affect performance, but we'd have generic isotope support. Admittedly, this feels cleaner (as single isotopes of an element aren't really elements exactly) and it covers us in case they/we start using additional isotopes in the future.

In this situation, I'd vote for the first option given the focused nature of this library at the moment (ProForma support). But I can completely understand if the group disagrees with me. Or maybe there is another option. Thoughts @acesnik @rmillikin @lschaffer2 or others?

acesnik commented 6 years ago

Thanks for outlining these options, @rfellers!

I think the unimod tables are okay as a simplified starting point, since the major use case will be writing the chemical formulas of common isotopic labels, which mostly use these isotopes.

For the long term, I strongly prefer option number 2. Even in the case of isotopic labels, unimod leaves out sulfur isotopes, which could be very useful since there are so many stable isotopes of sulfur, and since there are some large mass defects between them. There are also many metal isotopes that can be adducted to proteins.

acesnik commented 5 years ago

Completed in https://github.com/topdownproteomics/sdk/pull/60#pullrequestreview-178100668?

rfellers commented 5 years ago

Feels completed to me