rformassspectrometry / MetaboAnnotation

High level functionality to support and simplify metabolomics data annotation.
https://rformassspectrometry.github.io/MetaboAnnotation/
12 stars 9 forks source link

filterMatches, ScoreThresholdParam #87

Closed andreavicini closed 1 year ago

andreavicini commented 1 year ago

filterMatches, ScoreThresholdParam to perform filtering the matches based on a threshold for the "score" variable. Also filtering based on "score_rt" is possible with filterScoreRt = TRUE (not sure if that's the best way or, for example, we could also use a parameter scorename with the name of the score the filtering will use and default value "score"?)

andreavicini commented 1 year ago

I agree with you and I should have updated accordingly. I was not sure if the user would know the name of the "score variables" available in a object. Maybe we could return such names with a simple function (assuming they are the names in object@matches excluding query_idx and target_idx)? or maybe they could also read directly from colnames(object) (but that includes also variables that are not in object@matches).

jorainer commented 1 year ago

@andreavicini , regarding your question with the function to return score column names - I think that's a great idea! Question is just how to name that function - should it be matchedVariables or scoreVariables? Happy to discuss.

andreavicini commented 1 year ago

@jorainer, I'm fine with both or could also be matchesVariables. Do you think there would be use cases when object@matches contain a column that is not a score?

jorainer commented 1 year ago

Hm, yes, I think we do also add other information into the @matches, such as the adduct (if I'm not wrong). So, from matchedVariables I would expect to get all variables that are present in the matchedData data frame (same as for spectraVariables and spectraData). So, I would be maybe a little in favour of scoreVariables, but can also be convinced for other choices.

andreavicini commented 1 year ago

Yes, you are right about the adduct columns (don't know why I didn't think of that 😅). But now I'm thinking: if we want a function that returns only score columns in @matches how do we extract those? Maybe we cannot assume that they are always called "score" and "score_rt" or can we? Or maybe we can just have matchesValues return all column names in @matches (except query_idx and target_idx) and write in the documentation that the user should use score variables for filterMatches, ScoreThresholdParam?

andreavicini commented 1 year ago

Unrelated question: do you think it would ever be helpful to filter based on character variables in @matches such as adducts?

jorainer commented 1 year ago

For the former, we could have a simple grep that searches for column names in matches that contain "score" and return these with the scoreVariables.

For the second question, yes, I think it would be helpful to filter also based on e.g. "adduct" or other character columns.