vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
262 stars 53 forks source link

difference between results using library-free and its generated library search #690

Open animesh opened 1 year ago

animesh commented 1 year ago

I am trying to compare results between library-free search report.pg_matrix.tsv.txt

diann.exe --f "C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p001_Slot1-21_1_4411.d
" --f "C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d
" --f "C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d
" --f "C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d
" --lib "" --threads 4 --verbose 1 --out "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.tsv" --qvalue 0.01 --matrices  --out-lib "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.tsv" --gen-spec-lib --predictor --prosit --fasta "C:\Users\animeshs\MaxQuant 2.4.0.0\homo_sapiens\UP000005640_9606.fasta" --fasta "C:\Users\animeshs\MaxQuant 2.4.0.0\homo_sapiens\UP000005640_9606_additional.fasta" --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --cut K*,R* --missed-cleavages 1 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --unimod4 --var-mods 5 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --monitor-mod UniMod:1 --reanalyse --relaxed-prot-inf --smart-profiling --pg-level 0 --peak-center --no-ifs-removal 
DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks)
Compiled on Apr 14 2022 15:31:19
Current date and time: Sat May  6 13:29:53 2023
CPU: GenuineIntel Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz
SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2 
Logical CPU cores: 24
Thread number set to 4
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
A spectral library will be generated
Deep learning will be used to generate a new in silico spectral library from peptides provided
Library-free search enabled
Min fragment m/z set to 200
Max fragment m/z set to 1800
N-terminal methionine excision enabled
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 1
Min peptide length set to 7
Max peptide length set to 30
Min precursor m/z set to 300
Max precursor m/z set to 1800
Min precursor charge set to 1
Max precursor charge set to 4
Cysteine carbamidomethylation enabled as a fixed modification
Maximum number of variable modifications set to 5
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else
When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones
Implicit protein grouping: isoform IDs; this determines which peptides are considered 'proteotypic' and thus affects protein FDR calculation
Fixed-width center of each elution peak will be used for quantification
Interference removal from fragment elution curves disabled
DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.
Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library
The following variable modifications will be scored: UniMod:1 

4 files will be processed
[0:00] Loading FASTA C:\Users\animeshs\MaxQuant 2.4.0.0\homo_sapiens\UP000005640_9606.fasta
[0:03] Loading FASTA C:\Users\animeshs\MaxQuant 2.4.0.0\homo_sapiens\UP000005640_9606_additional.fasta
[2:34] Processing FASTA
[2:56] Assembling elution groups
[3:32] 7869801 precursors generated
[3:46] Prosit input saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.prosit.csv
[3:48] Gene names missing for some isoforms
[3:48] Library contains 81433 proteins, and 20514 genes
[3:49] Encoding peptides for spectra and RTs prediction
[4:14] Predicting spectra and IMs
[120:27] Predicting RTs
[135:40] Decoding predicted spectra and IMs
[136:36] Decoding RTs
[136:44] Saving the library to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.predicted.speclib
[137:16] Initialising library

[137:23] First pass: generating a spectral library from DIA data
[137:23] File #1/4
[137:23] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p001_Slot1-21_1_4411.d
For most diaPASEF datasets it is better to manually fix both the MS1 and MS2 mass accuracies to values in the range 10-15 ppm.
[138:39] 5571791 library precursors are potentially detectable
[138:40] Processing...
[576:39] RT window set to 2.46669
[576:39] Peak width: 0
[576:39] Scan window radius set to 5
[576:41] Recommended MS1 mass accuracy setting: 10.4353 ppm
[1415:50] Optimised mass accuracy: 3.24793 ppm
[1855:19] Removing low confidence identifications
[1855:20] Searching PTM decoys
[1863:37] Removing interfering precursors
[1863:39] Too few confident identifications, neural networks will not be used
[1863:39] Number of IDs at 0.01 FDR: 0
[1863:39] Number of IDs at 0.01 FDR: 0
[1863:39] Calculating protein q-values
[1863:40] Number of protein isoforms identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only)
[1863:40] Quantification
[1863:41] Quantification information saved to C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p001_Slot1-21_1_4411.d.quant.

[1863:45] File #2/4
[1863:45] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d
[1864:57] 5571791 library precursors are potentially detectable
[1864:57] Processing...
[2046:02] RT window set to 1.80105
[2046:02] Ion mobility window set to 0.0322915
[2046:03] Recommended MS1 mass accuracy setting: 12.285 ppm
[2066:47] Removing low confidence identifications
[2066:47] Searching PTM decoys
[2067:19] Removing interfering precursors
[2067:22] Training neural networks: 11928 targets, 7259 decoys
[2067:25] Number of IDs at 0.01 FDR: 2700
[2067:25] Calculating protein q-values
[2067:26] Number of protein isoforms identified at 1% FDR: 224 (precursor-level), 176 (protein-level) (inference performed using proteotypic peptides only)
[2067:26] Quantification
[2067:26] Precursors with monitored PTMs at 1% FDR: 0 out of 12
[2067:26] Unmodified precursors with monitored PTM sites at 1% FDR: 0 out of 19
[2067:27] Quantification information saved to C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d.quant.

[2067:30] File #3/4
[2067:30] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d
[2069:00] 5571791 library precursors are potentially detectable
[2069:01] Processing...
[2092:52] RT window set to 1.95672
[2092:52] Ion mobility window set to 0.0374793
[2092:53] Recommended MS1 mass accuracy setting: 11.8816 ppm
[2114:59] Removing low confidence identifications
[2114:59] Searching PTM decoys
[2115:49] Removing interfering precursors
[2115:54] Training neural networks: 62365 targets, 37155 decoys
[2116:04] Number of IDs at 0.01 FDR: 25664
[2116:05] Calculating protein q-values
[2116:06] Number of protein isoforms identified at 1% FDR: 1410 (precursor-level), 1195 (protein-level) (inference performed using proteotypic peptides only)
[2116:06] Quantification
[2116:07] Precursors with monitored PTMs at 1% FDR: 181 out of 190
[2116:07] Unmodified precursors with monitored PTM sites at 1% FDR: 149 out of 155
[2116:08] Quantification information saved to C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d.quant.

[2116:12] File #4/4
[2116:12] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d
[2118:57] 5571791 library precursors are potentially detectable
[2118:58] Processing...
[2133:22] RT window set to 2.11759
[2133:22] Ion mobility window set to 0.044546
[2133:22] Recommended MS1 mass accuracy setting: 13.4659 ppm
[2178:45] Removing low confidence identifications
[2178:45] Searching PTM decoys
[2180:35] Removing interfering precursors
[2180:42] Training neural networks: 113368 targets, 63600 decoys
[2181:01] Number of IDs at 0.01 FDR: 45281
[2181:03] Calculating protein q-values
[2181:03] Number of protein isoforms identified at 1% FDR: 2132 (precursor-level), 1780 (protein-level) (inference performed using proteotypic peptides only)
[2181:03] Quantification
[2181:05] Precursors with monitored PTMs at 1% FDR: 358 out of 370
[2181:05] Unmodified precursors with monitored PTM sites at 1% FDR: 317 out of 332
[2181:07] Quantification information saved to C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d.quant.

[2181:14] Cross-run analysis
[2181:14] Reading quantification information: 4 files
[2181:15] Quantifying peptides
[2181:16] Assembling protein groups
[2181:26] Quantifying proteins
[2181:27] Calculating q-values for protein and gene groups
[2181:28] Calculating global q-values for protein and gene groups
[2181:28] Writing report
[2181:34] Report saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.tsv.
[2181:34] Saving precursor levels matrix
[2181:35] Precursor levels matrix (1% precursor and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.pr_matrix.tsv.
[2181:35] Saving protein group levels matrix
[2181:35] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.pg_matrix.tsv.
[2181:35] Saving gene group levels matrix
[2181:35] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.gg_matrix.tsv.
[2181:35] Saving unique genes levels matrix
[2181:35] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.unique_genes_matrix.tsv.
[2181:35] Stats report saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.stats.tsv
[2181:35] Generating spectral library:
[2181:35] 55713 precursors passing the FDR threshold are to be extracted
[2181:35] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d
[2182:49] 5571791 library precursors are potentially detectable
[2182:50] 217 spectra added to the library
[2182:52] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d
[2184:21] 5571791 library precursors are potentially detectable
[2184:22] 9665 spectra added to the library
[2184:27] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d
[2187:16] 5571791 library precursors are potentially detectable
[2187:21] 30363 spectra added to the library
[2187:28] Saving spectral library to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.tsv
[2187:40] 55713 precursors saved
[2187:40] Loading the generated library and saving it in the .speclib format
[2187:40] Loading spectral library C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.tsv
[2187:46] Spectral library loaded: 37767 protein isoforms, 19087 protein groups and 55713 precursors in 49297 elution groups.
[2187:46] Loading protein annotations from FASTA C:\Users\animeshs\MaxQuant 2.4.0.0\homo_sapiens\UP000005640_9606.fasta
[2187:47] Loading protein annotations from FASTA C:\Users\animeshs\MaxQuant 2.4.0.0\homo_sapiens\UP000005640_9606_additional.fasta
[2187:51] Gene names missing for some isoforms
[2187:51] Library contains 29010 proteins, and 7427 genes
[2187:51] Saving the library to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.tsv.speclib

[2188:00] Second pass: using the newly created spectral library to reanalyse the data
[2188:00] File #1/4
[2188:00] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p001_Slot1-21_1_4411.d
[2189:06] 55713 library precursors are potentially detectable
[2189:06] Processing...
[2193:35] RT window set to 0.463218
[2193:35] Recommended MS1 mass accuracy setting: 12.682 ppm
[2198:07] Removing low confidence identifications
[2198:07] Searching PTM decoys
[2198:08] Removing interfering precursors
[2198:09] Training neural networks: 44117 targets, 47284 decoys
[2198:18] Number of IDs at 0.01 FDR: 0
[2198:18] Calculating protein q-values
[2198:18] Number of protein isoforms identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only)
[2198:18] Quantification

[2198:21] File #2/4
[2198:21] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d
[2199:29] 55713 library precursors are potentially detectable
[2199:29] Processing...
[2200:27] RT window set to 0.47253
[2200:27] Ion mobility window set to 0.0199702
[2200:27] Recommended MS1 mass accuracy setting: 14.6229 ppm
[2200:31] Removing low confidence identifications
[2200:31] Searching PTM decoys
[2200:31] Removing interfering precursors
[2200:32] Training neural networks: 7521 targets, 1593 decoys
[2200:33] Number of IDs at 0.01 FDR: 6036
[2200:33] Calculating protein q-values
[2200:33] Number of protein isoforms identified at 1% FDR: 474 (precursor-level), 108 (protein-level) (inference performed using proteotypic peptides only)
[2200:33] Quantification
[2200:33] Precursors with monitored PTMs at 1% FDR: 0 out of 48
[2200:33] Unmodified precursors with monitored PTM sites at 1% FDR: 0 out of 25

[2200:36] File #3/4
[2200:36] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d
[2202:01] 55713 library precursors are potentially detectable
[2202:01] Processing...
[2202:20] RT window set to 0.632162
[2202:20] Ion mobility window set to 0.0170957
[2202:20] Recommended MS1 mass accuracy setting: 12.2618 ppm
[2202:28] Removing low confidence identifications
[2202:28] Searching PTM decoys
[2202:28] Removing interfering precursors
[2202:30] Training neural networks: 34271 targets, 18774 decoys
[2202:35] Number of IDs at 0.01 FDR: 29093
[2202:36] Calculating protein q-values
[2202:36] Number of protein isoforms identified at 1% FDR: 1509 (precursor-level), 1125 (protein-level) (inference performed using proteotypic peptides only)
[2202:36] Quantification
[2202:37] Precursors with monitored PTMs at 1% FDR: 182 out of 212
[2202:37] Unmodified precursors with monitored PTM sites at 1% FDR: 157 out of 174

[2202:42] File #4/4
[2202:42] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d
[2205:26] 55713 library precursors are potentially detectable
[2205:26] Processing...
[2205:34] RT window set to 0.818429
[2205:34] Ion mobility window set to 0.0128255
[2205:34] Recommended MS1 mass accuracy setting: 14.6023 ppm
[2205:47] Removing low confidence identifications
[2205:47] Searching PTM decoys
[2205:47] Removing interfering precursors
[2205:49] Training neural networks: 54119 targets, 40646 decoys
[2205:58] Number of IDs at 0.01 FDR: 45752
[2205:59] Calculating protein q-values
[2205:59] Number of protein isoforms identified at 1% FDR: 2027 (precursor-level), 1737 (protein-level) (inference performed using proteotypic peptides only)
[2205:59] Quantification
[2206:00] Precursors with monitored PTMs at 1% FDR: 324 out of 342
[2206:00] Unmodified precursors with monitored PTM sites at 1% FDR: 292 out of 322

[2206:08] Cross-run analysis
[2206:08] Reading quantification information: 4 files
[2206:09] Quantifying peptides
[2206:10] Quantifying proteins
[2206:11] Calculating q-values for protein and gene groups
[2206:11] Calculating global q-values for protein and gene groups
[2206:12] Writing report
[2206:17] Report saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.tsv.
[2206:17] Saving precursor levels matrix
[2206:18] Precursor levels matrix (1% precursor and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.pr_matrix.tsv.
[2206:18] Saving protein group levels matrix
[2206:18] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.pg_matrix.tsv.
[2206:18] Saving gene group levels matrix
[2206:18] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.gg_matrix.tsv.
[2206:18] Saving unique genes levels matrix
[2206:18] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.unique_genes_matrix.tsv.
[2206:18] Stats report saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.stats.tsv
[2206:18] Log saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.log.txt
Finished

DIA-NN exited
DIA-NN-plotter.exe "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.stats.tsv" "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.tsv" "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.pdf"
PDF report will be generated in the background

and a search using the library generated from previous search report.pg_matrix.tsv.txt

diann.exe --f "C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p001_Slot1-21_1_4411.d
" --f "C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d
" --f "C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d
" --f "C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d
" --lib "C:\Users\animeshs\230504_hela_test\DIA\DIANNlibFree\report-lib.predicted.speclib" --threads 4 --verbose 1 --out "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.tsv" --qvalue 0.01 --matrices  --out-lib "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.tsv" --gen-spec-lib --prosit --var-mods 5 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --monitor-mod UniMod:1 --reanalyse --relaxed-prot-inf --smart-profiling --pg-level 0 --peak-center --no-ifs-removal 
DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks)
Compiled on Apr 14 2022 15:31:19
Current date and time: Mon May  8 10:43:04 2023
CPU: GenuineIntel Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz
SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2 
Logical CPU cores: 24
Thread number set to 4
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
A spectral library will be generated
Maximum number of variable modifications set to 5
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else
When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones
Implicit protein grouping: isoform IDs; this determines which peptides are considered 'proteotypic' and thus affects protein FDR calculation
Fixed-width center of each elution peak will be used for quantification
Interference removal from fragment elution curves disabled
DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.
The following variable modifications will be scored: UniMod:1 
Unless the spectral library specified was created by this version of DIA-NN, it's strongly recommended to specify a FASTA database and use the 'Reannotate' function to allow DIA-NN to identify peptides which can originate from the N/C terminus of the protein: otherwise site localisation might not work properly for modifications of the protein N-terminus or for modifications which do not allow enzymatic cleavage after the modified residue

4 files will be processed
[0:00] Loading spectral library C:\Users\animeshs\230504_hela_test\DIA\DIANNlibFree\report-lib.predicted.speclib
[0:20] Library annotated with sequence database(s): C:\Users\animeshs\MaxQuant 2.4.0.0\homo_sapiens\UP000005640_9606.fasta; C:\Users\animeshs\MaxQuant 2.4.0.0\homo_sapiens\UP000005640_9606_additional.fasta
[0:20] Gene names missing for some isoforms
[0:20] Library contains 81433 proteins, and 20514 genes
[0:23] Spectral library loaded: 103414 protein isoforms, 172582 protein groups and 7869801 precursors in 2450715 elution groups.
[0:23] Preparing Prosit input from the spectral library provided
[0:35] Prosit input saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.prosit.csv
[0:54] Initialising library

[1:01] First pass: generating a spectral library from DIA data
[1:01] File #1/4
[1:01] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p001_Slot1-21_1_4411.d
For most diaPASEF datasets it is better to manually fix both the MS1 and MS2 mass accuracies to values in the range 10-15 ppm.
[2:16] 5571791 library precursors are potentially detectable
[2:17] Processing...
[457:57] RT window set to 2.46669
[457:57] Peak width: 0
[457:57] Scan window radius set to 5
[457:58] Recommended MS1 mass accuracy setting: 10.4353 ppm
[1299:10] Optimised mass accuracy: 3.24793 ppm
[1754:33] Removing low confidence identifications
[1754:34] Searching PTM decoys
[1763:38] Removing interfering precursors
[1763:40] Too few confident identifications, neural networks will not be used
[1763:40] Number of IDs at 0.01 FDR: 0
[1763:41] Number of IDs at 0.01 FDR: 0
[1763:41] Calculating protein q-values
[1763:41] Number of protein isoforms identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only)
[1763:42] Quantification
[1763:43] Quantification information saved to C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p001_Slot1-21_1_4411.d.quant.

[1763:46] File #2/4
[1763:46] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d
[1765:03] 5571791 library precursors are potentially detectable
[1765:04] Processing...
[1949:07] RT window set to 1.80105
[1949:07] Ion mobility window set to 0.0322915
[1949:08] Recommended MS1 mass accuracy setting: 12.285 ppm
[1969:40] Removing low confidence identifications
[1969:40] Searching PTM decoys
[1970:13] Removing interfering precursors
[1970:15] Training neural networks: 11928 targets, 7259 decoys
[1970:18] Number of IDs at 0.01 FDR: 2700
[1970:18] Calculating protein q-values
[1970:19] Number of protein isoforms identified at 1% FDR: 224 (precursor-level), 176 (protein-level) (inference performed using proteotypic peptides only)
[1970:19] Quantification
[1970:20] Precursors with monitored PTMs at 1% FDR: 0 out of 12
[1970:20] Unmodified precursors with monitored PTM sites at 1% FDR: 0 out of 19
[1970:21] Quantification information saved to C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d.quant.

[1970:22] File #3/4
[1970:22] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d
[1971:51] 5571791 library precursors are potentially detectable
[1971:52] Processing...
[1995:51] RT window set to 1.95672
[1995:51] Ion mobility window set to 0.0374793
[1995:51] Recommended MS1 mass accuracy setting: 11.8816 ppm
[2018:54] Removing low confidence identifications
[2018:54] Searching PTM decoys
[2019:45] Removing interfering precursors
[2019:50] Training neural networks: 62365 targets, 37155 decoys
[2020:00] Number of IDs at 0.01 FDR: 25664
[2020:01] Calculating protein q-values
[2020:02] Number of protein isoforms identified at 1% FDR: 1410 (precursor-level), 1195 (protein-level) (inference performed using proteotypic peptides only)
[2020:02] Quantification
[2020:03] Precursors with monitored PTMs at 1% FDR: 181 out of 190
[2020:03] Unmodified precursors with monitored PTM sites at 1% FDR: 149 out of 155
[2020:04] Quantification information saved to C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d.quant.

[2020:06] File #4/4
[2020:06] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d
[2022:51] 5571791 library precursors are potentially detectable
[2022:52] Processing...
[2037:29] RT window set to 2.11759
[2037:29] Ion mobility window set to 0.044546
[2037:30] Recommended MS1 mass accuracy setting: 13.4659 ppm
[2082:42] Removing low confidence identifications
[2082:43] Searching PTM decoys
[2084:34] Removing interfering precursors
[2084:41] Training neural networks: 113368 targets, 63600 decoys
[2085:02] Number of IDs at 0.01 FDR: 45281
[2085:03] Calculating protein q-values
[2085:03] Number of protein isoforms identified at 1% FDR: 2132 (precursor-level), 1780 (protein-level) (inference performed using proteotypic peptides only)
[2085:04] Quantification
[2085:05] Precursors with monitored PTMs at 1% FDR: 358 out of 370
[2085:05] Unmodified precursors with monitored PTM sites at 1% FDR: 317 out of 332
[2085:08] Quantification information saved to C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d.quant.

[2085:12] Cross-run analysis
[2085:12] Reading quantification information: 4 files
[2085:12] Quantifying peptides
[2085:14] Assembling protein groups
[2085:23] Quantifying proteins
[2085:25] Calculating q-values for protein and gene groups
[2085:25] Calculating global q-values for protein and gene groups
[2085:26] Writing report
[2085:32] Report saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.tsv.
[2085:32] Saving precursor levels matrix
[2085:32] Precursor levels matrix (1% precursor and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.pr_matrix.tsv.
[2085:32] Saving protein group levels matrix
[2085:32] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.pg_matrix.tsv.
[2085:32] Saving gene group levels matrix
[2085:32] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.gg_matrix.tsv.
[2085:32] Saving unique genes levels matrix
[2085:32] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.unique_genes_matrix.tsv.
[2085:32] Stats report saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-first-pass.stats.tsv
[2085:32] Generating spectral library:
[2085:32] 55713 precursors passing the FDR threshold are to be extracted
[2085:32] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d
[2086:45] 5571791 library precursors are potentially detectable
[2086:45] 217 spectra added to the library
[2086:47] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d
[2088:16] 5571791 library precursors are potentially detectable
[2088:18] 9665 spectra added to the library
[2088:20] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d
[2091:10] 5571791 library precursors are potentially detectable
[2091:16] 30363 spectra added to the library
[2091:19] Saving spectral library to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.tsv
[2091:31] 55713 precursors saved
[2091:31] Loading the generated library and saving it in the .speclib format
[2091:31] Loading spectral library C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.tsv
[2091:36] Spectral library loaded: 37767 protein isoforms, 19087 protein groups and 55713 precursors in 49297 elution groups.
[2091:36] Protein names missing for some isoforms
[2091:36] Gene names missing for some isoforms
[2091:36] Library contains 0 proteins, and 0 genes
[2091:36] Saving the library to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report-lib.tsv.speclib

[2091:44] Second pass: using the newly created spectral library to reanalyse the data
[2091:44] File #1/4
[2091:44] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p001_Slot1-21_1_4411.d
[2092:50] 55713 library precursors are potentially detectable
[2092:50] Processing...
[2097:19] RT window set to 0.463218
[2097:19] Recommended MS1 mass accuracy setting: 12.682 ppm
[2101:50] Removing low confidence identifications
[2101:50] Searching PTM decoys
[2101:51] Removing interfering precursors
[2101:52] Training neural networks: 44117 targets, 47284 decoys
[2102:01] Number of IDs at 0.01 FDR: 0
[2102:01] Too low number of IDs with NNs: reverting to the linear classifier
[2102:01] Number of IDs at 0.01 FDR: 0
[2102:01] Number of IDs at 0.01 FDR: 0
[2102:01] Calculating protein q-values
[2102:01] Number of protein isoforms identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only)
[2102:01] Quantification

[2102:03] File #2/4
[2102:03] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p01_Slot1-20_1_4413.d
[2103:11] 55713 library precursors are potentially detectable
[2103:11] Processing...
[2104:10] RT window set to 0.47253
[2104:10] Ion mobility window set to 0.0199702
[2104:10] Recommended MS1 mass accuracy setting: 14.6229 ppm
[2104:14] Removing low confidence identifications
[2104:14] Searching PTM decoys
[2104:14] Removing interfering precursors
[2104:15] Training neural networks: 7521 targets, 1593 decoys
[2104:16] Number of IDs at 0.01 FDR: 6051
[2104:16] Calculating protein q-values
[2104:16] Number of protein isoforms identified at 1% FDR: 474 (precursor-level), 108 (protein-level) (inference performed using proteotypic peptides only)
[2104:16] Quantification
[2104:16] Precursors with monitored PTMs at 1% FDR: 0 out of 48
[2104:16] Unmodified precursors with monitored PTM sites at 1% FDR: 0 out of 25

[2104:18] File #3/4
[2104:18] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_0p1_Slot1-19_1_4415.d
[2105:42] 55713 library precursors are potentially detectable
[2105:42] Processing...
[2106:02] RT window set to 0.632162
[2106:02] Ion mobility window set to 0.0170957
[2106:02] Recommended MS1 mass accuracy setting: 12.2618 ppm
[2106:10] Removing low confidence identifications
[2106:10] Searching PTM decoys
[2106:10] Removing interfering precursors
[2106:12] Training neural networks: 34271 targets, 18774 decoys
[2106:17] Number of IDs at 0.01 FDR: 29089
[2106:17] Calculating protein q-values
[2106:17] Number of protein isoforms identified at 1% FDR: 1511 (precursor-level), 1129 (protein-level) (inference performed using proteotypic peptides only)
[2106:17] Quantification
[2106:18] Precursors with monitored PTMs at 1% FDR: 174 out of 210
[2106:18] Unmodified precursors with monitored PTM sites at 1% FDR: 151 out of 174

[2106:21] File #4/4
[2106:21] Loading run C:\Users\animeshs\230504_hela_test\DIA\230502_helaDIA_Slot1-54_1_4417.d
[2109:05] 55713 library precursors are potentially detectable
[2109:05] Processing...
[2109:13] RT window set to 0.818429
[2109:13] Ion mobility window set to 0.0128255
[2109:13] Recommended MS1 mass accuracy setting: 14.6023 ppm
[2109:27] Removing low confidence identifications
[2109:27] Searching PTM decoys
[2109:27] Removing interfering precursors
[2109:29] Training neural networks: 54119 targets, 40646 decoys
[2109:39] Number of IDs at 0.01 FDR: 45771
[2109:40] Calculating protein q-values
[2109:40] Number of protein isoforms identified at 1% FDR: 2027 (precursor-level), 1742 (protein-level) (inference performed using proteotypic peptides only)
[2109:40] Quantification
[2109:41] Precursors with monitored PTMs at 1% FDR: 325 out of 346
[2109:41] Unmodified precursors with monitored PTM sites at 1% FDR: 289 out of 322

[2109:46] Cross-run analysis
[2109:46] Reading quantification information: 4 files
[2109:46] Quantifying peptides
[2109:47] Quantifying proteins
[2109:49] Calculating q-values for protein and gene groups
[2109:49] Calculating global q-values for protein and gene groups
[2109:49] Writing report
[2109:55] Report saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.tsv.
[2109:55] Saving precursor levels matrix
[2109:56] Precursor levels matrix (1% precursor and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.pr_matrix.tsv.
[2109:56] Saving protein group levels matrix
[2109:56] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.pg_matrix.tsv.
[2109:56] Saving gene group levels matrix
[2109:56] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.gg_matrix.tsv.
[2109:56] Saving unique genes levels matrix
[2109:56] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.unique_genes_matrix.tsv.
[2109:56] Stats report saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.stats.tsv
[2109:56] Log saved to C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.log.txt
Finished

DIA-NN exited
DIA-NN-plotter.exe "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.stats.tsv" "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.tsv" "C:\Users\animeshs\230504_hela_test\DIA\DIANN\report.pdf"
PDF report will be generated in the background

and despite couple of IDs missing from each other compare.txt looks like spearman/rank-correlation between results is about ~99.9%

image

i am wondering about the differences, specifically the marked P11142 , also if there a way to reduce this randomness to make result fully reproducible between two searches?

vdemichev commented 1 year ago

Hi Ani,

Reproducibility is guaranteed if you use the same commands. Using library creation on the fly is not supposed to yield identical results as searching with .predicted.speclib, this is for technical reasons. In general, searching with .predicted.speclib is the recommended way.

Best, Vadim

hguturu commented 1 year ago

Since --reanalyse creates a library and re-searches using that library, how can we recreate the second pass search using the output library from the first pass? E.g., processing a set of files with --reanalyse creates both report-first-pass.tsv and report.tsv (the second pass) and a report-lib.tsv.

If I take the report-lib.tsv and search the same set of files without --reanalyse, I am unable to reproduce the report.tsv. Is there a way to reproduce the second pass? Any suggested flag changes? I assume DIA-NN implicitly does a tighter search in the second flag and we need to some how enable that search from the command line?