smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 45 forks source link

Filtering refactor #2285

Closed Alexander-Sol closed 1 year ago

Alexander-Sol commented 1 year ago

Previously, PSM filtering happened in several different places in PostSearchAnalysis task. This led to inconsistencies. This PR introduces three functions that work together to create a list of PSMs, _filteredPSMs, that are filtered to a user-specified q-value. This list is used for quantification, spectral library generation/updating, and results writing. _filteredPsms is then accessed through the GetFilteredPsms method. This PR does not affect protein parsimony. Filtering for parsimony is still hard-coded in PostSearchAnalysisTask.ProteinAnalysis.

image

image

This PR also fixed an issue where file specific results gave incorrect information for PSMs and peptides. This was caused by file specific FDR analysis using a list of PSMs that had already been filtered. PostSearchAnalysisTaskTests.AllResultsAndResultsTxtTests was modified to address and test for this specific inconsistency.

Common parameters was modified by removing the QValueOutputFilter and PepQValueOutputFilter properties. References to these properties were removed from the GlycoPostSearchAnalysisTask. QValueThreshold and PepQValueThreshold were created in their place. These new properties define what PSMs will be used for quant and spectral library generation.

A new property was introduced to SearchParameters, WriteHighQValuePsms. This enables users to specify whether or not high q-value results should be written and is consistent in name and usage with the existing properties WriteDecoys and WriteContaminants.

Finally, the GUI was updated to reflect these changes

image

image

References to QValueOutputFilter and PepQValueOutputFilter properties of CommonParameters were removed from the GlycoPostSearchAnalysisTask. These properties were never integrated into the GlycoSearchTaskWindow, and as such could never be modified by the user and were always set to the default value of 1.0.

Changes to Generate/Update SpectralLibrary: Previously, spectral library writing and updating ran slowly. The bottleneck was grouping PSMs according to their sequence and charge state, which had an algorithmic complexity of O(n^2). I refactored this section of the code using LINQ's IGrouping. It now runs in O(n) time.

codecov[bot] commented 1 year ago

Codecov Report

Merging #2285 (e15bfb6) into master (2fc4f6e) will increase coverage by 0.06%. The diff coverage is 93.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2285      +/-   ##
==========================================
+ Coverage   91.95%   92.01%   +0.06%     
==========================================
  Files         135      135              
  Lines       20481    20601     +120     
  Branches     2853     2827      -26     
==========================================
+ Hits        18833    18957     +124     
- Misses       1160     1161       +1     
+ Partials      488      483       -5     
Impacted Files Coverage Δ
...taMorpheus/TaskLayer/SearchTask/MzIdentMLWriter.cs 99.43% <ø> (+0.13%) :arrow_up:
...yer/GlycoSearchTask/PostGlycoSearchAnalysisTask.cs 93.19% <66.66%> (+0.43%) :arrow_up:
...eus/TaskLayer/SearchTask/PostSearchAnalysisTask.cs 93.36% <92.39%> (+0.31%) :arrow_up:
MetaMorpheus/EngineLayer/CommonParameters.cs 95.16% <100.00%> (+0.89%) :arrow_up:
...us/TaskLayer/MbrAnalysis/SpectralRecoveryRunner.cs 92.92% <100.00%> (+1.41%) :arrow_up:
MetaMorpheus/TaskLayer/MetaMorpheusTask.cs 87.28% <100.00%> (+0.01%) :arrow_up:
MetaMorpheus/TaskLayer/PepXMLWriter.cs 97.36% <100.00%> (+0.63%) :arrow_up:
...skLayer/SearchTask/PostSearchAnalysisParameters.cs 100.00% <100.00%> (ø)
...aMorpheus/TaskLayer/SearchTask/SearchParameters.cs 100.00% <100.00%> (ø)
MetaMorpheus/TaskLayer/SearchTask/SearchTask.cs 94.75% <100.00%> (ø)