pcgroup clarification - Githubissues

sneumann / CAMERA

This is the git repository matching the Bioconductor package CAMERA: Collection of annotation related methods for mass spectrometry data

11 stars 22 forks source link

pcgroup clarification #81

Closed apulvino closed 6 months ago

apulvino commented 1 year ago

Is the pcgroup ID/value is equivalent to ChemSpider database ID/value? I don't see clarification on whether this is the case or not in the CAMERA documentation and only allusions to it in the documentation from other packages... Thanks ahead for the clarification!

stanstrup commented 1 year ago

Nope. The pcgroup number doesn't hold an significance itself. All it says is that features with the same pcgroup are probably from the same molecule based on the correlation analysis the functions have done.

apulvino commented 1 year ago

Is there a useful tool which can handle annotating these data with metabolite common names?

stanstrup commented 1 year ago

Not in any magic way. Identification in metabolomics is a large subject that you cannot find a tool that just does. There are many complex tools that can help you though. Database matching can be done with https://github.com/rformassspectrometry/MetaboAnnotation. Typically that would be done with MS2 data though. But you could use the CAMERA groups as pseudo MS2 data.

Other complex tools include SIRIUS and the suite of GNPS tools.

apulvino commented 1 year ago

do you know if it's possible to not provide ionization for every mz and intensity value you have for a run of the basic sirius command (ie. sirius [OPTIONS] -z -i -1 -2 <MS/MS FILE>), and still have it predict with what else is provided... which is to say, is this "complex tool" complex because you suspect it is difficult to learn, or complex because it is somewhat inflexible with respect to its inputs?

stanstrup commented 1 year ago

I am not using these tools myself so I cannot help you. I meant complex in the sense that what they do is a (internally) sophisticated process. I don't think they are that difficult to learn. But what is very important is that they are not trusted to provide truth. There is so much BS published where people use these tools without critical thinking about the likelihood of the results. Identification needs to be verified by old fashioned methods, e.g. standards and considering the alternative structures that are as plausible as whatever top candidate you get. You won't find any magic tool that just spits out a list of names you can sensibly use without manually curation. Those that do that provide only noise to the literature. But this is getting far outside the scope of this issue tracker....

apulvino commented 1 year ago

I appreciate the resolution! I have more of an NGS-focused data analysis background and so this was useful insight regarding the state of the field comparatively. The run I'm analyzing is targeted... Is there certain metadata about target standards that I can ask my core for so that I can run some sort of alignment algorithm using my experimental data?

Not sure if this is possible, or what kind of data I would be asking for in this case (in terms of format/what tool would allow me to run such an alignment).