lehtiolab / ddamsproteomics

A Nextflow MS DDA proteomics pipeline
MIT License
3 stars 5 forks source link

Removal of target-matching decoy peptides skews database #19

Closed glormph closed 1 year ago

glormph commented 1 year ago

In v2.12 we introduced shuffling of decoy peptides so they will not match a target peptide (all fully tryptic). Xiaofang found "way too many PSMs found for plasma data", and a bad MSGF score distribution containing many peptides with bad scores.

After some in-depth discussion with Rui instead of the quick discussion we had prior to including this feature/bug, we found that this biases decoy databases to include fewer short peptides, making it harder to match missed-cleavage peptides. Also it skews the precursor mass distribution by removing shorter peptides which are more likely to match target. This is bad, will investigate why plasma is hurt so much more by plotting some scores, and push a fix.

glormph commented 1 year ago

plasma_ov_normal_filtered.pdf

Tests show serious impact on the target/decoy distributions, which is even more apparent in plasma data.