rickhelmus / patRoon

Workflow solutions for mass-spectrometry based non-target analysis.
https://rickhelmus.github.io/patRoon/
GNU General Public License v3.0
58 stars 17 forks source link

merge generateCompounds from different tools #108

Closed Boris-Droz closed 2 months ago

Boris-Droz commented 2 months ago

Hi Rick, I try to merge the result of generateCompounds made by Metfrag and using in-house library tools. First I taught that I can used consensus() but this gave me not exactly what I want. But maybe I am lost with the setting. Briefly, 1. I generateCompounds with Metfrag and separatively with a library. 2. I would like to have a compounds object merging the two tools information containing the highest match score in order to increase the level of confidence of my final annotation. Using consensus decrease my final level of confidence because I have some LC2a in Metfrag that I don't have in the library or reverse.

Thank you for your help Best Boris

rickhelmus commented 2 months ago

Hi @Boris-Droz

Interesting question... in principle, taking a consensus is indeed the way to go to merge the results. There are a few options to control this process, but I'm unsure if they would help you in this case. There might be a few reasons for candidates to drop in ID level, from what I can quickly foresee it could be related to the re-ranking of candidates or averaging of scores that occur while making the consensus.

Perhaps to get some more hints on what is actually happening, you could check and compare a few log files that annotateSuspects() generates? They are by default stored in log/ident and are text files with information for each suspect hit. Then we can maybe think of some things to improve this process.

Thanks, Rick

Boris-Droz commented 2 months ago

Hi Rick, Thank for the help. You right to remind me that the main purpose of consensus is to make an average between tools. What I am looking for is little bit different. As different tool work differently, I was just expecting annotated with high level of confidence some features with some tools that I can not annotated with some other. So I see a large advantage of doing this for prioritization. I was hoping that I made a mistake on the parameter of the consensus and that this option was possible using some parameter on the function.

Anyway, I try to chase down how to improve this process using the lo/ident. That became more questionable on how this process of consensus work. For the test I was focussing on one specific features. For Metfrag annotation only I get: Checking ID level type 'individualMoNAScore' ID level type passed! assigned level '2a'!

For LibMatch I get: Checking ID level type 'libMatch' ID level type passed! assigned level '2a'!

Then using the a consensus between Metfrga + libMatch, I was expecting a level of confidence of 2a, but get Checking level '3c' Checking ID level type 'annMSMSSim' (for compound) ID level type passed! assigned level '3c'!

I tried to change the parameter of the consensus function but always get the same values. Thank you again for your help. Boris

rickhelmus commented 2 months ago

Hi Boris,

I just quickly tried to do something similar in order to have some data to test. Using the patRoon demo data and suspects lists, I got

So there was one candidate in the consensus that got degraded to a level 3, but that's because with the consensus it was ranked second instead of first. I think this usually quite reasonable, but if you are really sure it's not, then perhaps you could somehow filter out unwanted candidates from the compounds object (either before or after making a consensus), e.g. by using the delete() function. Another option is to adjust ID rules and remove the constraint of being the top ranked, but that may not always be wanted... Or perhaps you have some suggestions on what could be done in these scenarios?

Thanks, Rick

Boris-Droz commented 2 months ago

Hi Rick, Thank you for your inside. I am curious did you use the defauft parameter or which parameter for your test case using the patRoon demo data and suspects lists? Thank you Boris

rickhelmus commented 2 months ago

Hi Boris,

All was mostly with defaults. Below is the script I used, which is mostly a template from newProject() with a few additions at the end.

# Script automatically generated on Mon Apr 29 16:13:00 2024

library(patRoon)

# -------------------------
# initialization
# -------------------------

workPath <- "E:/devel/tests/test2"
setwd(workPath)

# Example data from patRoonData package (triplicate solvent blank + triplicate standard)
anaInfo <- patRoonData::exampleAnalysisInfo("positive")

# -------------------------
# features
# -------------------------

# Find all features
# NOTE: see the reference manual for many more options
fList <- findFeatures(anaInfo, "openms", noiseThrInt = 1000, chromSNR = 3, chromFWHM = 5, minFWHM = 1, maxFWHM = 30)

# Group and align features between analyses
fGroups <- groupFeatures(fList, "openms", rtalign = TRUE)

# Basic rule based filtering
fGroups <- filter(fGroups, preAbsMinIntensity = 100, absMinIntensity = 10000, relMinReplicateAbundance = 1,
                  maxReplicateIntRSD = 0.75, blankThreshold = 5, removeBlanks = TRUE,
                  retentionRange = NULL, mzRange = NULL)

# -------------------------
# suspect screening
# -------------------------

# Get example suspect list
suspList <- patRoonData::suspectsPos

# Set onlyHits to FALSE to retain features without suspects (eg for full NTA)
# Set adduct to NULL if suspect list contains an adduct column
fGroups <- screenSuspects(fGroups, suspList, rtWindow = 12, mzWindow = 0.005, adduct = "[M+H]+", onlyHits = TRUE)

# -------------------------
# annotation
# -------------------------

# Retrieve MS peak lists
avgMSListParams <- getDefAvgPListParams(clusterMzWindow = 0.005)
mslists <- generateMSPeakLists(fGroups, "mzr", maxMSRtWindow = 5, precursorMzWindow = 4,
                               avgFeatParams = avgMSListParams,
                               avgFGroupParams = avgMSListParams)
# Rule based filtering of MS peak lists. You may want to tweak this. See the manual for more information.
mslists <- filter(mslists, absMSIntThr = NULL, absMSMSIntThr = NULL, relMSIntThr = NULL, relMSMSIntThr = 0.05,
                  topMSPeaks = NULL, topMSMSPeaks = 25)

# Calculate formula candidates
formulas <- generateFormulas(fGroups, mslists, "genform", relMzDev = 5, adduct = "[M+H]+", elements = "CHNOP",
                             oc = FALSE, calculateFeatures = TRUE,
                             featThresholdAnn = 0.75)

# Calculate compound structure candidates
compounds <- generateCompounds(fGroups, mslists, "metfrag", dbRelMzDev = 5, fragRelMzDev = 5, fragAbsMzDev = 0.002,
                               adduct = "[M+H]+", database = "pubchemlite",
                               maxCandidatesToStop = 2500)
compounds <- addFormulaScoring(compounds, formulas, updateScore = TRUE)

# Annotate suspects
fGroups <- annotateSuspects(fGroups, formulas = formulas, compounds = compounds, MSPeakLists = mslists,
                            IDFile = "idlevelrules.yml")

mslib <- loadMSLibrary("~/../Downloads/MassBank_NIST (1).msp", "msp")
compoundsLib <- generateCompounds(fGroups, mslists, "library", mslib, adduct = "[M+H]+")

fGroupsLib <- annotateSuspects(fGroups, formulas = formulas, compounds = compoundsLib, MSPeakLists = mslists,
                               IDFile = "idlevelrules.yml")

compoundsCons <- consensus(compounds, compoundsLib)

fGroupsCons <- annotateSuspects(fGroups, formulas = formulas, compounds = compoundsCons, MSPeakLists = mslists,
                                IDFile = "idlevelrules.yml")

siMF <- screenInfo(fGroups)
siLib <- screenInfo(fGroupsLib)
siCons <- screenInfo(fGroupsCons)
siCons <- siCons[name %in% c(siMF$name, siLib$name)]
siCons[, IDL_MF := {
    n <- name
    siMF[match(n, name)]$estIDLevel
}]
siCons[, IDL_lib := {
    n <- name
    siLib[match(n, name)]$estIDLevel
}]
siCons <- siCons[numericIDLevel(estIDLevel) > pmin(numericIDLevel(IDL_MF), numericIDLevel(IDL_lib))]

Thanks, Rick

Boris-Droz commented 2 months ago

Hi Rick, Thank you very much for the help really appreciate your support on this. I was able with the help you provide to work around and resolve my issue. Best Boris