shuzhao-li-lab / khipu

a Python library for generalized, low-level annotation of MS metabolomics
Other
7 stars 2 forks source link

Isotope annotation in unlabeled LC-HRMS data #28

Open YasinEl opened 4 months ago

YasinEl commented 4 months ago

Hello and thank you for this package.

I am trying to use this for isotope annotation in an unlabeled LC-HRMS metabolomics dataset (natural isotope patterns). However, some aspects of this are not clear to me. For example, is it adequate to use

isotope_search_patterns = [
    # Carbon isotopes
    (1.003355, '13C/12C', (0, 0.8)),
    (2.00671, '13C/12C*2', (0, 0.8)),
    (3.010065, '13C/12C*3', (0, 0.8))#,

    # Nitrogen isotopes
    (0.997035, '15N/14N', (0, 0.8)),
    (1.99407, '15N/14N*2', (0, 0.8)),
    (2.991105, '15N/14N*3', (0, 0.8)),

    # Oxygen isotopes
    (2.004245, '18O/16O', (0, 0.8)),
    (4.00849, '18O/16O*2', (0, 0.8)),
    (6.012735, '18O/16O*3', (0, 0.8)),

    # Sulfur isotopes
    (1.995796, '34S/32S', (0, 0.8)),
    (3.991592, '34S/32S*2', (0, 0.8)),
    (5.987388, '34S/32S*3', (0, 0.8))
]

as isotope patterns? I have seen that (0, 0.8) is used in all examples but dont know what this actually stands for? Also is it possible to allow for no missing values within the isotope patterns as would be expected for natural isotope patterns? For example It seems I get many cases where some steps in the isotope pattern are missing such as:

image

which is not possible in the natural isotope pattern (meaning a 18O or 34S is probably mistaken for 13C2, or something like that). Is it possible to require that missing isotopes in the C series are not allowed?

Also as the actual issue: It would be great to have a notebook showing how to do this classical analysis of annotating isotope patterns with common adducts in a unlabeled dataset.

Thank you! Yasin

jmmitc06 commented 3 weeks ago

Hi Yasin,

I can't believe that I missed this issue, my apologies, it was not my intention to ignore this.

The numbers you are noticing is a left-over from previous versions of Khipu where we were playing with intensity filters on the isotopologue relations, they are not currently used in the processing.

The reason we do not use them is related to your second question about enforcing the order in which isotopologues should be seen. The observed intensity of an isotopologue is determined by a multinomial-like calculation that depends on the number of that element in the formula, the number of isotopes you are considering, and the natural abundance probability of the isotope(s) for the element. As such, it is not always the case that m+13Cn is always more intense than m+13Cn+1, so the assumption that we should not have 'gaps' in the isotopologues is not completely accurate, especially for elements such as Cl and Br.

However, I think that in the real world, if we consider only CHONPS elements (okay, maybe not S), the above concern is largely academic. Also, its not worth generating lots of bad results to protect against this rare edge case. I can't promise you that I can implement that rule quickly, but it is now on the roadmap.

As for your final point, let me know if the notebooks here help resolve your issue:

https://github.com/shuzhao-li-lab/MANA2024/tree/main/Module%202%20-%20Metabolite%20Annotation%20and%20Stable%20Isotope%20Tracing

Also, if you want to use Khipu in a workflow you can find some examples here:

https://github.com/shuzhao-li-lab/asari_pcpfm_tutorials

Please do not hesitate to reach out again, my apologies for not seeing this ooner.

p.s. if you want to talk about anything, please come find me at MANA, I'll be around all day.