mobiusklein / brainpy

A Python implementation of Baffling Recursive Algorithm for Isotopic distributioN calculations
http://mobiusklein.github.io/brainpy
Apache License 2.0
19 stars 10 forks source link

Difference in about 7.5 m/z with predicted and experimental spectra. #4

Open IgnacioJPickering opened 3 years ago

IgnacioJPickering commented 3 years ago

I'm not a mass spec person, but I've recently had to analyza a compound with formula that has multiple copper and sulphur atoms, and brainpy gives me pretty large deviations from the experimental spectra; in particular, I get a ~ 7.5 m/z deviation for all peaks in the spectrum. These deviations are not observed in other spectra MS prediction software, such as the online applet https://www.sisweb.com/mstools/isotope.htm In particular, for a molecular formula H60C46Cu3Se1N3 calling brainpy with:

from brainpy import isotopic_variants
m = {'H': 60, 'C': 46, 'Cu':3, 'Se': 1, 'N':3}
theoretical_isotopic_cluster = isotopic_variants(m, npeaks=10, charge=0)
for peak in theoretical_isotopic_cluster:
    print(peak.mz, peak.intensity)

Produces as a result

923.1840377386 0.0022227601516974682
924.1872973029356 0.0011455754382822827
925.3231309664317 0.02666552921958156
926.3839931791374 0.03269740350959813
927.4379433948883 0.10527109615812123
928.4370417263187 0.07597124620984472
929.5678061770984 0.24311678266903156
930.5656801824274 0.1288335183195178
931.637296795344 0.2601097788982547
932.6444988578364 0.12396630942607062

It seems to me that brainpy is predicting 931.64 to be the m/z with the maximum intensity, but https://www.sisweb.com/mstools/isotope.htm using high resolution, H60C46Cu3Se1N3 and minimum abundance of 0.01% predicts

924.18108 0.81
924.18381 6.83
924.18602 0.08
924.18637 23.57
924.18647 4.02
924.1874 37.21
924.18794 0.06
924.18857 0.05

(note that these are only some example peaks which contain the max value) which means 924.1874 is the m/z with maximum intensity, and these results agree 100% with the experimental spectrum, which makes me suspect a bug in brainpy.

mobiusklein commented 3 years ago

Based upon the scale of the m/z deltas your reporting, what you're seeing is isotopic fine structure, where the delta m/z between peaks is the difference between the masses of neutrons of different elements rather than differences between number of extra neutrons.

brainpy implements the BRAIN algorithm which collapses fine structure, as described in “BRAIN: a universal tool for high-throughput calculations of the isotopic distribution for mass spectrometry.,” Anal. Chem., vol. 85, no. 4, pp. 1991–4, Feb. 2013.

If you want a tool to generate fine structure, check out https://github.com/MatteoLacki/IsoSpec which has Python bindings at IsoSpecPy already available on the Python package index.

Isotopic fine structure is only visible above a certain resolution, and will depend upon your instrument. Do you know what the settings on your instrument (detector type, resolution) and whether you see larger isotopologues or isotopomers?

IgnacioJPickering commented 3 years ago

Thank you for the prompt response!

I'm not exactly sure what you mean by "difference in masses of neutrons", as I said I'm not knowledgable about this topic, but my guess is that the deviation I'm seeing, which is about ~ 7.5 m/z is a bit too large. I don't really care about anything after the first 3-4 sig figs (i.e. I care if the number is 924 m/z or 931 m/z but nothing else) I don't really need that level of accuracy, my issue is that brainpy predicts about ~ 931 m/z as the largest peak, but even a naive calculation of the m/z by using the masses of the most abundant isotopes of each element gives a value between 922-924 m/z in this case.

I knew BRAIN implemented an approximation but I didn't know it would be possible for it to fail to this extent (~ 7.5 m/z or 1% of the m/z value), but maybe I'm misunderstanding the use case of this algorithm, I'm guessing this makes much more sense being designed for much larger biomolecules / proteins, and not so much for small organic molecules such as the one I'm trying to predict.

I used the package since the example shown in the brainpy's README plots a MS spectra of a relatively small organic molecule at approximately this scale, (800 - 805 m/z) so I recommend some sort of disclaimer is added to the documentation if these differences are to be expected.

Thanks a lot for the IsoSpec recommendation! I will definitely check that out, and thank you again for helping with this issue.

IgnacioJPickering commented 3 years ago

Also, I wanted to add that the difference of m/z between peaks is pretty accurate, the issue is that all peaks seem to be displaced by ~ 7 m/z, which lead my further to believe that this is some sort of bug that is displacing values by ~ 7 in this case, instead of an artifact of some approximation in the algorithm, which I would guess would probably give very small random deviations for each peak, but again, maybe this is just a consequence of this "fine structure collapse" that I'm not fully grasping.

I attach for comparison the values predicted by the applet using a low resolution spectra, as you can see even in this case the maximum peak is ~ 925 m/z, pretty far from brainpy's prediction of 931

917 0.9
918 0.4
919 9.9
920 12.3
921 39.6
922 28.7
923 92.5
924 49.2
925 100
926 47.9
927 54.3
928 23.8
929 15.3
930 5.8
931 2
932 0.6
933 0.1
mobiusklein commented 3 years ago

Thank you for reporting the issue, and then clarifying it when I dismissed it in haste.

I was paying too much attention to that second peak list thinking you were talking about distance between peaks. Now I understand it's that the theoretical most abundant peak is wrong. I think the issue is related to Selenium, not an element I've dealt with at this level of fundamentals.

Visualizing the pattern for the composition you're using: image

Selenium's isotopes are weird:

In [11]: Se = brainpy.periodic_table['Se']

In [12]: Se.isotopes
Out[12]:
OrderedDict([(-6,
              Isotope(abundance=0.0089, neutrons=74, mass=73.9224764, neutron_shift=-6)),
             (-4,
              Isotope(abundance=0.0937, neutrons=76, mass=75.9192136, neutron_shift=-4)),
             (-3,
              Isotope(abundance=0.0763, neutrons=77, mass=76.919914, neutron_shift=-3)),
             (-2,
              Isotope(abundance=0.2377, neutrons=78, mass=77.9173091, neutron_shift=-2)),
             (0,
              Isotope(abundance=0.4961, neutrons=80, mass=79.9165213, neutron_shift=0)),
             (2,
              Isotope(abundance=0.0873, neutrons=82, mass=81.9166994, neutron_shift=2))])

If I try to simplify the problem, I can isolate it:

In [33]: brainpy.isotopic_variants({"Se": 1}, npeaks=10)
Out[33]:
[Peak(mz=79.916521, intensity=0.086745, charge=0),
 Peak(mz=82.075165, intensity=0.913255, charge=0)]

Note how the minimal mass peak (the "monoisotopic" peak) has a mass equal to the true monoisotopic mass for Se[78], but has the abundance of Se[74], while

I suspect that there is an error in the logic somewhere under https://github.com/mobiusklein/brainpy/blob/master/brainpy/_c/isotopic_constants.pyx#L274, probably in https://github.com/mobiusklein/brainpy/blob/master/brainpy/_c/isotopic_constants.pyx#L102 where it assumes that the lowest isotope is 0 or at least non-negative, which is not the case for Se.

Unfortunately, I have a pretty full log of work right now. If IsoSpec can do the job for you, I'd rather you use that than wait while I try to fix this now, as it'd take me a while just to get to this.

IgnacioJPickering commented 3 years ago

Ahh, I see what the issue may be! I think you narrowed it down pretty well. IsoSpec does the job for me right now, so I've switched to that, I'm also pretty busy but in the future if you can't find time to fix this I may do a PR, since I like the documentation/API of brainpy a bit more.

jgardner-gmet commented 1 year ago

Hi both,

I know I'm about 2 years late on this, but wanted to say I really appreciate this package and that I think I have a resolution to this. I seemed to notice this behavior with iron (in heme) and found that there were a couple things to resolve (in the pure pythonic):

This should allow you to undo the comment for Fe if I'm not mistaken.

Anyway, thanks again for sharing this, hope this captures your intent.

mobiusklein commented 1 year ago

Thank you for the suggested solution. Is this the region you are referring to modifying for the second point?

https://github.com/mobiusklein/brainpy/blob/master/src/brainpy/brainpy.py#L576-L583

            for element, ele_sym_poly in ele_sym_poly_map.items():
                center += self.composition[element] * sign * ele_sym_poly[i] *\
                 self.monoisotopic_peak.intensity * self._isotopic_constants[element].element.monoisotopic_mass()
jgardner-gmet commented 1 year ago

Yes! That's correct. Apologies for the delay and for not linking the code directly.