vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
272 stars 53 forks source link

DIANN error big dataset #899

Closed ypriverol closed 4 months ago

ypriverol commented 9 months ago

Hi @vdemichev I'm trying to run a large experiment with quantms. In the latest step of quantms, I got the following error:

pst_prd@codon-dm-05:/hps/nobackup/juan/pride/reanalysis/absolute-expression/platelet/PXD039236$ tail -n 500 -f work/40/916f1229262d23cf0064ad40e0ef38/assemble_empirical_library.log 
DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks)
Compiled on Apr 15 2022 08:45:18
Current date and time: Thu Jan 11 19:17:15 2024
Logical CPU cores: 48
Thread number set to 48
The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs
Existing .quant files will be used
A fast algorithm will be used to select the MS2 mass accuracy setting
Mass accuracy will be determined separately for different runs
Scan windows will be inferred separately for different runs
A spectral library will be generated
DIA-NN will optimise the mass accuracy separately for each run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.

15620 files will be processed
[0:00] Loading spectral library lib.predicted.speclib
[0:04] Library annotated with sequence database(s): Homo-sapiens-uniprot-reviewed-entrap-contaminants-202310.fasta
[0:04] Protein names missing for some isoforms
[0:04] Gene names missing for some isoforms
[0:04] Library contains 20676 proteins, and 20183 genes
[0:06] Spectral library loaded: 41081 protein isoforms, 65581 protein groups and 7397781 precursors in 3688943 elution groups.
[0:06] Initialising library

[0:32] Cross-run analysis
[0:32] Reading quantification information: 15620 files
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_fill_insert

The command it really large because the analysis is in more than 15K files. Here the summary of the command:

diann {all the mzML files} 
        --lib lib.predicted.speclib \
        --threads 48 \
        --out-lib empirical_library.tsv \
        --verbose 3 \
        --rt-profiling \
        --temp ./quant/ \
        --use-quant \
        --quick-mass-acc --individual-mass-acc \
        --individual-windows \
        --gen-spec-lib \
         \
        2>&1 | tee assemble_empirical_library.log
vdemichev commented 9 months ago

Could be out of memory.

vdemichev commented 9 months ago

This is DIA-based lib creation step, can do this based on a subset of runs

ypriverol commented 9 months ago

This is DIA-based lib creation step, can do this based on a subset of runs

@vdemichev can you suggest a smart way of selecting the subset of runs?

vdemichev commented 9 months ago

With this number of runs, I would just recommend selecting at random

ypriverol commented 9 months ago

@vdemichev I will close the issue when we implement your suggestion in quantms and see if we can finish the dataset.

vbrennsteiner commented 4 months ago

@ypriverol i am facing the same issue - did you solve the issue in the end?

ypriverol commented 4 months ago

Yes we solve it in quantms. You can run the latest version of quantms with DIANN.