Closed pavel-shliaha closed 8 years ago
While looking at this I found something strange in the performance
function:
## Ident peptides
I <- object$IdentPeptideData$precursor.leID
uI <- unique(I)
## Quant peptides
Q <- object$QuantPeptideData$precursor.leID
uQ <- unique(Q)
w <- c(length(setdiff(uQ, uS)),
length(setdiff(uS, uQ)),
length(intersect(uS, uQ)))
names(w) <- c("Q", "S", "QS")
What do the differences/intersection of precursor.leIDs
tell us? Isn't that useless? E.g. if we use a master file we regenerate the precursor.leIDs
as 1:nrow(x)
. The same precursor.leID
in different runs means nothing or I am wrong?
Shouldn't we do something like:
intersect(quant$peptide.seq, ident$peptide.seq)
Sorry, my fault. I missed the most important line:
## synapter results
S <- object$MatchedEMRTs[object$MatchedEMRTs$matchedEMRTs == 1,
"spectrumID"]
uS <- unique(S)
The spectrumID
corresponds to the quant$precursor.leID
so it seems all right.
tested it with new synapter version. Does not seem to work.
@pavel-shliaha sorry if this was misleading. I didn't fixed anything yet.
no probs, sorry
Ok, the problem is maybe that I didn't document the plotFragmentMatching
function properly.
The EMRTs are categorised into unique-true
/unique-false
(or non-unique-true
/non-unique-false
) according to the precursor.leID.quant
column in the MergedFeatures data.frame (same id == true match, otherwise false match (I now changed it in the following commit: https://github.com/lgatto/synapter/commit/874797256762ffa95c16c4e69cab83fd107630d1 ; same id == true match, different id == false match, no quant id available == no-quant-id
(was treated as false match before)).
That's because we rely on the PLGS identification to decide if it is a true
or false
match. In the current example the MergedFeatures data.frame has 9917 EMRTs.
There are 6165 EMRTs that are uniquely matched (by the grid search) to the same quant.id as PLGS did (== unique-true
; blue dots/line left panel). Additionally there are 2244 EMRTs that are matched among others to the same quant.id as PLGS did (== non-unique-true
; blue dots/line right panel; 3417 other matches; we would classify around 1500 of them wrongly as true-match
if we accept a delta in number of common peaks
of 0; red dots/line right panel)). 461 EMRTs are different from the PLGS result (==unique-false
; red dots/line left panel).
performance
reports 12813 uniquely matched EMRTs because we apply the fragment matching filter rules to all EMRTs (not only to MergedFeatures).
We use the plotFragmentMatching
function to estimate the error we would make if we filter by a specific threshold based on the "ground truth" we have in the MergedFeatures data.frame. That's why it is expected that performance
reports a higher number of unique EMRTs than plotFragmentMatchingPerformance
.
sorry, Sebastian, my bad. I now understand that what happens actually makes sense! I will resume work on synapter paper shortly.
The performance function and performance 2 behave very strangely. For the data in:
setwd ("Y:\RAW\pvs22_QTOF_DATA_data3\synapter2paper\kuharev2015\bugs_investigation\S130423_09")
"Y" is \prot-filesvr1
synapterAnalysis <- readRDS("synapterAnalysis.RDS")
then I use
plotFragmentMatchingPerformance (synapterAnalysis)
as you can see there is only ca 6.000 uniquely matched EMRTs not 12813.