vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
284 stars 53 forks source link

diann-1.8.1 stops without error message after running for hours on Ubuntu 22.04.4 LTS #1000

Open bioinformatic-guy opened 7 months ago

bioinformatic-guy commented 7 months ago

I recently began using the Linux platform for work. I considered using our server, which is running Ubuntu 22.04.4 LTS, to run DIA-NN. I installed .deb version. I converted .wiff to .mzml and kept in a path. From there, I'm running diann. A few hours later, it simply disappears from the task manager. Neither could I find any errors. In this regard, I would appreciate some advice from the community.

I have the log file attached.

DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 15 2022 08:45:18 Current date and time: Thu Apr 18 17:30:02 2024 Logical CPU cores: 64 Thread number set to 40 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided Library-free search enabled Min fragment m/z set to 200 Max fragment m/z set to 1800 N-terminal methionine excision enabled In silico digest will involve cuts at K,R Maximum number of missed cleavages set to 1 Min peptide length set to 7 Max peptide length set to 30 Min precursor m/z set to 300 Max precursor m/z set to 1800 Min precursor charge set to 1 Max precursor charge set to 4 Cysteine carbamidomethylation enabled as a fixed modification Maximum number of variable modifications set to 1 Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable Neural networks will be used for peak selection A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme. Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library The following variable modifications will be scored: UniMod:1 WARNING: double-pass mode is incompatible with PTM scoring, turned off WARNING: MBR turned off, two or more raw files are required

1 files will be processed [0:00] Loading FASTA /current_data/proteomics_analysis/decoy_uniprot_sprot_human_iRTpep.fasta [0:09] Processing FASTA [0:34] Assembling elution groups [0:56] 11215130 precursors generated [0:57] Protein names missing for some isoforms [0:57] Gene names missing for some isoforms [0:57] Library contains 20362 proteins, and 20142 genes [1:00] Encoding peptides for spectra and RTs prediction [1:22] Predicting spectra and IMs [20:16] Predicting RTs [21:15] Decoding predicted spectra and IMs [21:33] Decoding RTs [21:45] Saving the library to /current_data/proteomics_analysis/diann_output/lib_mzml.predicted.speclib [22:11] Initialising library

[22:21] File #1/1 [22:21] Loading run /current_data/proteomics_analysis/K562_DIA_001.mzML [24:06] 8122661 library precursors are potentially detectable [24:07] Processing... [59:28] RT window set to 4.81344 [59:28] Peak width: 7.996 [59:28] Scan window radius set to 17 [59:30] Recommended MS1 mass accuracy setting: 28.0018 ppm

vdemichev commented 7 months ago

Could it be out of RAM? Btw, please do in silico prediction and actual raw data analysis in separate steps, i.e. first generate a .predicted.speclib and then analyse using it.

bioinformatic-guy commented 7 months ago

I don't think so, it is RAM related issues because we aren't running anything at that time on the server. But somehow it generates .predicted.speclib file in the mentioned path. But as you saw from the log file the whole analysis is not completed.

So, I will definitely try what you mentioned, I save the spectral library and then use that for diann analysis.

Thanks for helping. I will update if any further issue is required.

bioinformatic-guy commented 6 months ago

@vdemichev Thanks for your suggestion. It works. I have few questions regarding this.

  1. May I know the possible reason behind this? (Means why usually this thing happened, RAM ?? ) we're using 125 gb RAM.
  2. Is it required or good practice, to generate spectral library for every analysis? If i don't generate a spectral library, will it take less time?

diann-1.8.1 --f "/current_data/proteomics_analysis/K562_DIA_001.mzML" --lib "/current_data/proteomics_analysis/diann_output/lib_mzml_2.predicted.speclib" --threads 40 --verbose 1 --out "/current_data/proteomics_analysis/diann_output/report_mzml_3.tsv" --qvalue 0.01 --matrices --out-lib "/current_data/proteomics_analysis/diann_output/lib_mzml_3.tsv" --gen-spec-lib --predictor --var-mods 1 --var-mod UniMod:35,15.994915,M --double-search --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal

DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 15 2022 08:45:18 Current date and time: Wed May 1 19:10:45 2024 Logical CPU cores: 64 Thread number set to 40 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided Maximum number of variable modifications set to 1 Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable Neural networks will be used for peak selection Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.

1 files will be processed [0:00] Loading spectral library /current_data/proteomics_analysis/diann_output/lib_mzml_2.predicted.speclib [0:13] Library annotated with sequence database(s): /current_data/proteomics_analysis/decoy_uniprot_sprot_human_iRTpep.fasta [0:13] Protein names missing for some isoforms [0:13] Gene names missing for some isoforms [0:13] Library contains 20362 proteins, and 20142 genes [0:22] Spectral library loaded: 40726 protein isoforms, 58872 protein groups and 11215130 precursors in 3486582 elution groups. [0:24] Encoding peptides for spectra and RTs prediction [0:48] Predicting spectra and IMs [14:04] Predicting RTs [15:06] Decoding predicted spectra and IMs [15:23] Decoding RTs [15:36] Saving the library to /current_data/proteomics_analysis/diann_output/lib_mzml_3.predicted.speclib [15:46] Initialising library

[15:58] File #1/1 [15:58] Loading run /current_data/proteomics_analysis/K562_DIA_001.mzML [17:37] 8122661 library precursors are potentially detectable [17:38] Processing... [52:40] RT window set to 4.81344 [52:40] Peak width: 7.996 [52:40] Scan window radius set to 17 [52:42] Recommended MS1 mass accuracy setting: 28.0018 ppm [114:06] Optimised mass accuracy: 42.3498 ppm [134:05] Removing low confidence identifications [134:06] Removing interfering precursors [134:11] Training neural networks: 27381 targets, 26497 decoys [134:14] Number of IDs at 0.01 FDR: 14829 [150:52] Removing low confidence identifications [150:53] Removing interfering precursors [150:59] Training neural networks: 41532 targets, 28104 decoys [151:02] Number of IDs at 0.01 FDR: 15559 [151:03] Calculating protein q-values [151:04] Number of genes identified at 1% FDR: 2402 (precursor-level), 2381 (protein-level) (inference performed using proteotypic peptides only) [151:04] Quantification [151:06] Quantification information saved to /current_data/proteomics_analysis/K562_DIA_001.mzML.quant.

[151:09] Cross-run analysis [151:09] Reading quantification information: 1 files [151:09] Quantifying peptides [151:10] Assembling protein groups [151:15] Quantifying proteins [151:15] Calculating q-values for protein and gene groups [151:16] Calculating global q-values for protein and gene groups [151:16] Writing report [151:17] Report saved to /current_data/proteomics_analysis/diann_output/report_mzml_3.tsv. [151:17] Saving precursor levels matrix [151:17] Precursor levels matrix (1% precursor and protein group FDR) saved to /current_data/proteomics_analysis/diann_output/report_mzml_3.pr_matrix.tsv. [151:17] Saving protein group levels matrix [151:17] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to /current_data/proteomics_analysis/diann_output/report_mzml_3.pg_matrix.tsv. [151:17] Saving gene group levels matrix [151:17] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to /current_data/proteomics_analysis/diann_output/report_mzml_3.gg_matrix.tsv. [151:17] Saving unique genes levels matrix [151:17] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to /current_data/proteomics_analysis/diann_output/report_mzml_3.unique_genes_matrix.tsv. [151:17] Stats report saved to /current_data/proteomics_analysis/diann_output/report_mzml_3.stats.tsv [151:17] Generating spectral library: [151:17] 15559 precursors passing the FDR threshold are to be extracted [151:17] Loading run /current_data/proteomics_analysis/K562_DIA_001.mzML [153:02] 8122661 library precursors are potentially detectable [153:07] 9409 spectra added to the library [153:07] Saving spectral library to /current_data/proteomics_analysis/diann_output/lib_mzml_3.tsv [153:09] 15559 precursors saved [153:09] Loading the generated library and saving it in the .speclib format [153:09] Loading spectral library /current_data/proteomics_analysis/diann_output/lib_mzml_3.tsv [153:09] Spectral library loaded: 3096 protein isoforms, 3006 protein groups and 15559 precursors in 13899 elution groups. [153:09] Protein names missing for some isoforms [153:09] Gene names missing for some isoforms [153:09] Library contains 0 proteins, and 0 genes [153:09] Saving the library to /current_data/proteomics_analysis/diann_output/lib_mzml_3.tsv.speclib [153:09] Log saved to /current_data/proteomics_analysis/diann_output/report_mzml_3.log.txt Finished

  1. May I know the possible reason behind this? (Means why usually this thing happened, RAM ?? ). we're using 125 gb RAM.
  2. Is it required or good practice, to generate spectral library for every analysis? If i don't generate a spectral library, will it take less time?
vdemichev commented 6 months ago
  1. Not sure why this could be happening. Anyway, the DIA-NN code is basically overhauled in the new versions, including memory usage, and the problem should disappear.
  2. Enough to do it once for each FASTA/FASTAs, e.g. once for each species of interest. Yes, this is always recommended for reproducibility purposes. The only reason DIA-NN supports doing this 'on the fly' is to make it easier for first-time users and also compatibility with legacy lib-free mode that did not use deep learning.