I am struggling to reproduce the FDR estimates in the results file from a pLink2 (2.3.9) search. The html report and the filtered file shows 77950 peptide pairs at 5% FDR (1180 Inter- and 6770 Intra links). I want to understand how the unfiltered results are processed to get the 5% peptide pairs.
Therefore, I loaded the "unfiltered" data and used the Q-value column as filter (5%). To reduce the complexity, I am only looking at betweens for now. The filtered table consists of 1180 target-target identifications, 103 target-decoy identificationsand 3 decoy-decoy identifications. Using the formula (TD-DD) / TT, this results in a FDR estimate of ~8.4% (>5%). Am I doing somewthing wrong here? Even with the Q-value estimation the FDR should not be so different, right?
I also tried to recompute the FDR using various approaches (SVM_Score, Score, Refined_Score columns, sorted either ascending/descending). However, I couldnt reproduce the Q-value measures (even with smoothing). Which one is actually the main score here used for FDR computation? From the wiki it is not 100% clear to me if its the Score column or the SVM_Score ("Score" column is not explained for the unfiltered).
Edit:
I should add that I computed FDR separately in pLink.
Dear pLink2-Team,
I am struggling to reproduce the FDR estimates in the results file from a pLink2 (2.3.9) search. The html report and the filtered file shows 77950 peptide pairs at 5% FDR (1180 Inter- and 6770 Intra links). I want to understand how the unfiltered results are processed to get the 5% peptide pairs.
Therefore, I loaded the "unfiltered" data and used the Q-value column as filter (5%). To reduce the complexity, I am only looking at betweens for now. The filtered table consists of 1180 target-target identifications, 103 target-decoy identificationsand 3 decoy-decoy identifications. Using the formula (TD-DD) / TT, this results in a FDR estimate of ~8.4% (>5%). Am I doing somewthing wrong here? Even with the Q-value estimation the FDR should not be so different, right?
I also tried to recompute the FDR using various approaches (SVM_Score, Score, Refined_Score columns, sorted either ascending/descending). However, I couldnt reproduce the Q-value measures (even with smoothing). Which one is actually the main score here used for FDR computation? From the wiki it is not 100% clear to me if its the Score column or the SVM_Score ("Score" column is not explained for the unfiltered).
Edit: I should add that I computed FDR separately in pLink.
I appreciate any help!
Thanks.