sirius-ms / sirius

SIRIUS is a software for discovering a landscape of de-novo identification of metabolites using tandem mass spectrometry. This repository contains the code of the SIRIUS Software (GUI and CLI)
GNU Affero General Public License v3.0
84 stars 20 forks source link

Result Interpretation - missing "Posterior Probability or Normalized Scores" in the TSV Result Files #46

Closed zmahnoor14 closed 2 years ago

zmahnoor14 commented 3 years ago

Hello,

Thank you for your efforts in developing and updating SIRIUS, which is an amazing tool for Metabolite Identification.

For the "result output", the standardized directory structure is easy to understand. SIRIUS gives the high scoring formula identification and structure identification in the project space directory. However, I would also like to check the other possible outcomes from formula, structure, and canopus tools; the results from these tools are present in the Compound level directory.

Since I am using the CLI version, I obtain the results in TSV format. And the result tables for "formula", "structure", "predicted fingerprints" and "canopus" doesn't have a "probability scores" column or a column that gives the "normalized scores". However, if I load this project space in SIRIUS-GUI, there is a probability score or normalized score from 0-1 or 0-100%, which is easier to interpret rather than the SIRIUS or CSI-FingerID scores.

Is it possible to add those probability scores or normalized scores into the TSV result files for easier automation of the results? If there is documentation for the "result output" to understand the TSV files, that would be really helpful for the result interpretation. Also, there are some files named canopus_neg.tsv or csi_fingerid_neg.tsv? Can you also explain what "neg" means here?

Best Regards, Mahnoor

kaibioinfo commented 3 years ago

Hi, the canopus.csv file lists all compound classes together with their "relative index". For each compound and its molecular formula annotation you have a file [compound-name]/canopus/[formula_adduct].fpt

This is a plain text file containing all posterior probabilities for each compound class, starting with the compound class with relative index 0.

You can also use the https://github.com/kaibioinfo/canopus_treemap tool to parse the project space and make those information available via a python api.