smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 45 forks source link

Decoy Parsimony and Top-Picked FDR #1432

Open trishorts opened 5 years ago

trishorts commented 5 years ago

propose we skip parsimony for decoy peptides. Do only target parsimony. Then use the target groups to create replica decoy groups. Then assign top scoring only decoy peptides to the decoy protein groups. This means every target group has an exact complement in decoy space. And, we can then do top-picked FDR all the way down the list.

zrolfs commented 5 years ago

I have concerns about this. I understand this as being a second-pass search that uses a reduced search space which is biased to the target identifications that were used to construct the reduced space.

zrolfs commented 5 years ago

I'm still worried about the partial decoy thing that we're doing (take 100 decoys, if one outscores it counts as 1/100 of a decoy). Nobel was generating 100 decoy DATABASES, not 100 decoys. The top decoy from each of his 100 databases was used in the comparison. Generating only 100 decoys and comparing them with the target is not a fair measure of the FDR. TD assumes that a false positive has a 50:50 chance of being assigned target or decoy. If you compare a spectra with 900 targets and 100 decoys, there's a 90:10 chance of being labeled a target when it's a false positive. The method might be functional, because these decoys are more targeted than random decoys, but there's not a statistical foundation for doing this.

trishorts commented 5 years ago

I don't see this as a second pass search. Still had in mind the same searches we're doing now against target and decoy databases. Differnence is in the parsimony step. suggesting we don't do parsimony with the decoy matches but construct decoy protein groups that match the target protein groups. Then use the top scoring psm for each to score the group. Top picked simply junks the lower scoring group.

rmillikin commented 5 years ago

The multi-shuffled decoy thing is a separate issue than what @trishorts is suggesting here. It's not a change in the TD search strategy. just a change in parsimony