veitveit / PhosFake

0 stars 0 forks source link

Move detectability calculation to digestion to avoid overhead #1

Closed veitveit closed 1 month ago

veitveit commented 3 months ago

AND change to quantile filtering for the RFScore

veitveit commented 2 months ago

Quantile filtering is done, and I am not sure whether to move the other one to digestion yet.

mlocardpaulet commented 1 month ago

Just a thought: right now, if I understand well, we "just" remove the less detectable peptides. So in our pipeline, the quantification is never impacted by flyability, right? I wonder if instead we should not apply a "flyability factor" to the peptide quantities. And then filter based on the resulting relative quantities...

veitveit commented 1 month ago

I understand that detectability is the combined flyability and LC "performance". Is there any study that is able to predict actual flyability? The last time I heard that term, they corrected it to detectability when I asked :-).

mlocardpaulet commented 1 month ago

I think that you are right, sorry. But then replace "flyability" by "detectability" in my message. The question remains...

veitveit commented 1 month ago

Ups, I did not realize that :-)

veitveit commented 1 month ago

Now I think I got your point :-).

So you mean the detectability could be used to filter on basis of the quantitative values? The question then would be how. There seem to be many possibilities.

mlocardpaulet commented 1 month ago

we could multiply each quantity with the detectability factor determined by the model. It would reduce the signal of the less-detectable peptides. Then, when using the detection threshold, we would remove the less abundant peptidoforms, this would result from the combination of their initial signal and their detectability.

veitveit commented 1 month ago

Not sure how much impact this would have on the distributions as they then will be influenced by the distribution of detectability scores. Or am I wrong here?

mlocardpaulet commented 1 month ago

yeah... Also, the detectability factor as calculated with peptideranger has nothing to do with signal (although it may be a confounding factor). The metric used for training is the proportion a runs a given peptide is identified divided by the total number of runs.

veitveit commented 1 month ago

Done