Quantitative mass spectrometry workflow. Currently supports proteomics experiments with complex experimental designs for DDA-LFQ, DDA-Isobaric and DIA-LFQ quantification.
My plan is to use OpenMS' QCCalculator additionally in (almost) each step to create small mzQC files with additional summaries.
Those mzQC files should contain only stuff that cannot be read from the final mzTab.
This would also allow skipping the copying of the input mzMLs to the pmultiqc step since it just needs to read the already summarized data in the mzQC.
Please list places and metrics that we need to extract in the comments @ypriverol @timosachsenberg
MzMLs (run QCCalculator during mzML Indexing step?):
Export all metrics that our QC classes can do
Export number of spectra per file
idXMLs (per Search engines):
score distributions target vs decoy
Which scores to export?
Best hit only?
histogram or full density?
nr targets vs decoys
hits per psm?
idXMLs (after Perc/IDPEP):
target vs decoy distribution again
idXMLs (after consensusID):
overlap between search engines (e.g. 2D plot for every pair of search engines)
histogram of number of times a psm was identified with same, with different, ...
nr targets vs decoys
hits per psm?
idXMLs (after filtering):
do we need anything here?
idXMLs (after inference):
see #27
depends a bit on the order of FDR filtering if this can be inferred by comparing the mzTab with the raw IDs per file (but currently we do FDR filter before quantification, therefore it indeed might be helpful to know if a protein is missing because of filtering after inference or because of missing quant data
in any case, we need that information since we per-default also filter out decoys and a target-decoy score distribution plot would be helpful for proteins as well.
for TMT the inference idXML is easily accessible
features:
since we only generate features internally for ProteomicsLFQ, we must export summarized feature QC metrics during execution (or write out the temporary featureXMLs even without debug mode).
for TMT this does not really exist because the "consensus" features are not really 2D features
consensus features:
is there anything important that is not available in the mzTab?
Description of feature
My plan is to use OpenMS' QCCalculator additionally in (almost) each step to create small mzQC files with additional summaries. Those mzQC files should contain only stuff that cannot be read from the final mzTab. This would also allow skipping the copying of the input mzMLs to the pmultiqc step since it just needs to read the already summarized data in the mzQC.
Please list places and metrics that we need to extract in the comments @ypriverol @timosachsenberg
MzMLs (run QCCalculator during mzML Indexing step?):
idXMLs (per Search engines):
idXMLs (after Perc/IDPEP):
idXMLs (after consensusID):
idXMLs (after filtering):
idXMLs (after inference):
features:
consensus features: