vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
259 stars 53 forks source link

terminate called after throwing an instance of 'CppSQLite3Exception' #506

Closed jflucier closed 1 year ago

jflucier commented 1 year ago

Hi

I have build a singularity image from DIANN docker image v1.8.1

I have run sucessfully on a single sample diaNN. See execution log and submission: diann-singlesample.slurm.log diann-singlesample.submit.sh.gz

I have retried to submit with multiple samples acquisition (40 samples) and the execution fails with the following error:

DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks)
Compiled on Apr 15 2022 08:45:18
Current date and time: Fri Sep  9 18:40:58 2022
Logical CPU cores: 40
Thread number set to 40
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 2
N-terminal methionine excision enabled
Library-free search enabled
A spectral library will be generated
Copies of the spectral library and the FASTA database will be saved along with the final report
Min peptide length set to 7
Max peptide length set to 30
Min precursor charge set to 1
Max precursor charge set to 5
Min precursor m/z set to 100
Max precursor m/z set to 1700
Min fragment m/z set to 100
Max fragment m/z set to 1500
Deep learning will be used to generate a new in silico spectral library from peptides provided
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
Precursor/protein x samples expression level matrices will be saved along with the main report
When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones
Implicit protein grouping: protein names; this determines which peptides are considered 'proteotypic' and thus affects protein FDR calculation
Cysteine carbamidomethylation enabled as a fixed modification
Methionine oxidation enabled as a variable modification
Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable
Mass accuracy will be fixed to 2e-05 (MS2) and 2e-05 (MS1)
Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library

40 files will be processed
[0:00] Loading FASTA /localscratch/jflucier.31680231.0/UP000005640_9606.combo.fasta
[0:19] Processing FASTA
[0:44] Assembling elution groups
[1:15] 12160434 precursors generated
[1:16] Gene names missing for some isoforms
[1:16] Library contains 96080 proteins, and 20480 genes
[1:20] Encoding peptides for spectra and RTs prediction
[1:58] Predicting spectra and IMs
[10:21] Predicting RTs
[11:05] Decoding predicted spectra and IMs
[11:24] Decoding RTs
[11:36] Saving the library to /localscratch/jflucier.31680231.0/out/report-lib.predicted.speclib
[11:52] Initialising library

[12:03] First pass: generating a spectral library from DIA data
[12:03] File #1/40
[12:03] Loading run /scratch/jflucier/diann/data/Sheela_DIA_sample_001b_Slot2-22_1_12799.d
For most diaPASEF datasets it is better to manually fix both the MS1 and MS2 mass accuracies to values in the range 10-15 ppm.
[12:59] Run loaded
[13:02] 11407772 library precursors are potentially detectable
[13:04] Processing batch #1 out of 5703 
[13:04] Precursor search
[13:08] Optimising weights
[13:10] Calculating q-values
[13:10] Number of IDs at 0.01 FDR: 0
.....
[7835:13] Run loaded
[7835:18] 11407772 library precursors are potentially detectable
[7835:19] 2 spectra added to the library
[7835:20] Loading run /scratch/jflucier/diann/data/Sheela_DIA_sample_050b_Slot2-17_1_12789.d
[7836:11] Run loaded
[7836:14] 11407772 library precursors are potentially detectable
[7836:16] 599 spectra added to the library
[7836:17] Loading run /scratch/jflucier/diann/data/Sheela_DIA_sample_050_Slot1-47_1_10947.d
[7836:52] Run loaded
[7836:55] 11407772 library precursors are potentially detectable
[7836:57] 139 spectra added to the library
[7836:58] Loading run /scratch/jflucier/diann/data/Sheela_DIA_sample_059b_Slot2-25_1_12805.d
[7837:39] Run loaded
[7837:42] 11407772 library precursors are potentially detectable
[7837:44] 4 spectra added to the library
[7837:45] Loading run /scratch/jflucier/diann/data/Sheela_DIA_sample_059_Slot1-14_1_11196.d
terminate called after throwing an instance of 'CppSQLite3Exception'
/var/spool/slurmd/job31680231/slurm_script: line 84: 269345 Aborted                 singularity exec --writable-tmpfs -e $SLURM_TMPDIR/diann-1.8.1.sif diann --threads 40 --verbose 2 --f /scratch/jflucier/diann/data/Sheela_DIA_sample_001b_Slot2-22_1_12799.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_001_Slot1-30_1_10569.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_002b_Slot2-15_1_12785.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_002_Slot1-31_1_10571.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_008b_Slot2-16_1_12787.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_008_Slot1-5_1_10675.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_020b_Slot2-23_1_12801.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_020_Slot1-17_1_10887.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_037b_Slot2-24_1_12803.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_037_Slot1-34_1_10921.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_050b_Slot2-17_1_12789.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_050_Slot1-47_1_10947.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_059b_Slot2-25_1_12805.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_059_Slot1-14_1_11196.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_061b_Slot2-18_1_12791.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_061_Slot1-19_1_11206.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_071b_Slot2-26_1_12807.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_071_Slot1-26_1_11220.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_075b_Slot2-19_1_12793.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_075_Slot1-30_1_11228.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_084b_Slot2-20_1_12795.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_084_Slot1-39_1_11246.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_086b_Slot2-27_1_12809.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_086_Slot1-41_1_11250.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_088b_Slot2-21_1_12797.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_088_Slot1-43_1_11254.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_091b_Slot2-8_1_12771.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_091_Slot1-46_1_11260.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_092b_Slot2-9_1_12773.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_092_Slot1-47_1_11262.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_093b_Slot2-10_1_12775.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_093_Slot1-48_1_11264.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_096b_Slot2-11_1_12777.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_096_Slot1-51_1_11270.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_098b_Slot2-12_1_12779.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_098_Slot1-53_1_11274.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_099b_Slot2-13_1_12781.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_099_Slot1-54_1_11276.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_100b_Slot2-14_1_12783.d --f /scratch/jflucier/diann/data/Sheela_DIA_sample_100_Slot2-1_1_11278.d --temp $SLURM_TMPDIR/temp --cut K*,R* --missed-cleavages 2 --met-excision --fasta "$SLURM_TMPDIR/UP000005640_9606.combo.fasta" --fasta-search --gen-spec-lib --out-lib "$SLURM_TMPDIR/out/report-lib.tsv" --out-lib-copy --lib "" --out "$SLURM_TMPDIR/out/report.tsv" --mass-acc-ms1 20 --mass-acc 20 --min-pep-len 7 --max-pep-len 30 --min-pr-charge 1 --max-pr-charge 5 --min-pr-mz 100 --max-pr-mz 1700 --min-fr-mz 100 --max-fr-mz 1500 --predictor --reanalyse --matrices --smart-profiling --pg-level 1 --unimod4 --unimod35 --var-mod UniMod:1,42.010565,*n,ntermacetyl

Again, you can taker a look at full log and submission script: diann-multiplesamples.slurm.log diann-multiplesamples.submit.sh.gz

If i look at output folder, I see: image

Can you guide me i how to give a command to I bypass speclib creation since already in ouptut folder?

Any help debugging this is appreciated.

vdemichev commented 1 year ago

Well, you can try running again with library generation set to 'IDs, RT and IM profiling' - then it will not need to load the raw files. What happened is some error accessing that raw file on the disk...

jflucier commented 1 year ago

I performed 2 tests:

1) I rerun the whole pipleine with 5 files and it pass without problem 2) I rerun the above analysis and specified the --lib with the report-lib.tsv that was generated before the previous crash and it also pass without problem

I did not find the command line options to specify 'IDs, RT and IM profiling' in the documentation

thanks for your help

kostrouc commented 4 months ago

Is Dia-NN compatible with .d Bruker files?

I am observing this same error when running with Dia-NN 1.8.1 in Linux. There is no log written but I do have the printed output. I have tried replacing --smart-profiling with --rt-profiling as suggested but this was not successful.

(base) kostrouchov@ip$sudo /usr/diann/1.8.1/diann-1.8.1 --cfg ./20240412_fasta_config.txt--

A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Normalisation disabled MaxLFQ-based protein quantification disabled WARNING: unrecognised option [--] Mass accuracy will be fixed to 1e-05 (MS2) and 1e-05 (MS1) Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library WARNING: it's strongly recommended to use deep learning spectra/RTs prediction. If impossible because some unsupported modifications need to be searched, consider using either (i) the --strip-unknown-mods command or (ii) a 'training library' specified with the --learn-lib command. A training library might facilitate about 1.5x more IDs, deep learning - about 2x-3x more IDs. 153 files will be processed [0:00] Loading FASTA /mnt/data/kostrouchov/diann_fasta/20240105.fasta [0:46] Processing FASTA [1:22] Assembling elution groups [2:05] 21021765 precursors generated [2:06] Gene names missing for some isoforms [2:06] Library contains 174633 proteins, and 65427 genes [2:08] Initialising library [2:34] First pass: generating a spectral library from DIA data [2:34] File #1/153 [2:34] Loading run /mnt/data/kostrouchov/Tims_DIA-PASEF/Tims_DIA-PASEF_S2-D8_1_182.d terminate called after throwing an instance of 'CppSQLite3Exception'

(base) kostrouchov@ip:/mnt/data/kostrouchov/Tims_DIA-PASEF/Tim_DIA-PASEF_S2-D8_1_182.d$ ls

182.m analysis.tdf chromatography-data-pre.sqlite chromatography-data.sqlite-journal SampleInfo.xml analysis.tdf_bin chromatography-data.sqlite desktop.ini

(base) kostrouchov@ip:/mnt/data/kostrouchov/Tims_DIA-PASEF/Tim_DIA-PASEF_S2-D8_1_182.d/182.m$ ls

InstrumentSetup.isset backup-2024-03-24.m diaSettings.diasqlite lock.file prmSettings.prmsqlite Maldi.method desktop.ini hystar.method microTOFQImpacTemAcquisition.method submethods.xml

20240412_fasta_config.txt