vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

DIANN 1.8.1 Error Linux : Segmentation fault (core dumped) #1185

Open Soulaimane-Aboulouard opened 2 months ago

Soulaimane-Aboulouard commented 2 months ago

Hi Vadim,

I hope you're doing well. I'm encountering an issue while running DIANN on a Linux-based HPC cluster. Occasionally, the analysis crashes with the following error:

Segmentation fault (core dumped)

Despite my efforts, I haven't been able to resolve this issue. Could you please help me troubleshoot this problem?

Thank you in advance for your assistance!

Best, Soulaimane

_*Report log : DIA-NN 1.8 (Data-Independent Acquisition by Neural Networks) Compiled on Jun 28 2021 10:59:57 Current date and time: Wed Sep 25 00:22:02 2024 Logical CPU cores: 24 Thread number set to 16 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report In silico digest will involve cuts at K,R* Maximum number of missed cleavages set to 2 Cysteine carbamidomethylation enabled as a fixed modification Maximum number of variable modifications set to 3 Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable Existing .quant files will be used A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Mass accuracy will be fixed to 1.5e-05 (MS2) and 1e-05 (MS1)

74 files will be processed [0:00] Loading spectral library /home/soulaimane.aboulouard/DIANN-2/Library/Library-human-DIA-052023.predicted.speclib [0:18] Library annotated with sequence database(s): F:\database-uniprot\human\Uniprot-Human-Reviewed-20422seq-052023.fasta [0:23] Spectral library loaded: 42406 protein isoforms, 70096 protein groups and 11061990 precursors in 3253453 elution groups. [0:23] Loading protein annotations from FASTA /home/soulaimane.aboulouard/DIANN-2/Library/Uniprot-Human-Reviewed-20422seq-052023.fasta [0:23] Annotating library proteins with information from the FASTA database [0:23] Gene names missing for some isoforms [0:23] Library contains 20401 proteins, and 20183 genes [0:24] Initialising library

[0:32] First pass: generating a spectral library from DIA data [0:32] Cross-run analysis [0:32] Reading quantification information: 74 files [0:36] Quantifying peptides [0:38] Assembling protein groups /var/spool/slurm/d/job1093796/slurmscript: line 34: 2079340 Segmentation fault (core dumped) diann-1.8.1 --dir /home/soulaimane.aboulouard/DIANN-2/Sample-DIANN --lib /home/soulaimane.aboulouard/DIANN-2/Library/Library-human-DIA-052023.predicted.speclib --threads 16 --verbose 4 --out /home/soulaimane.aboulouard/DIANN-2/Resultat-DIANN/report.tsv --qvalue 0.01 --matrices --fasta /home/soulaimane.aboulouard/DIANN-2/Library/Uniprot-Human-Reviewed-20422seq-052023.fasta --cut K,R --missed-cleavages 2 --unimod4 --var-mods 3 --var-mod UniMod:35,15.994915,M --mass-acc 15 --mass-acc-ms1 10 --use-quant --reanalyse --relaxed-prot-inf --rt-profiling --peak-center --no-ifs-removal Job (post) done**

vdemichev commented 2 months ago

Hi Soulaimane,

I would suggest to switch to 1.9.1. The 1.9 and later DIA-NN are a different code-base so whatever errors manifested for 1.8.1 on Linux should not affect 1.9.1. For me it's currently challenging to troubleshoot anything related to 1.8.1, as DIA-NN development environment is now set for 1.9.

Best, Vadim

Soulaimane-Aboulouard commented 2 months ago

Hi Vadim,

The issue with version 1.9.1 is that matrix generation doesn't work properly on Linux, which is why I reverted back to version 1.8.1 while waiting for it to be resolved. I need the report-pg.matrix, but I noticed that version 1.9 on Linux has issues with matrix generation and should be avoided for this purpose.

What can I do in this situation?

Soulaimane

vdemichev commented 2 months ago

I need the report-pg.matrix

In case this is helpful, here's the R code (very easy) to generate pg_matrix from the main report: https://github.com/vdemichev/DiaNN/discussions/1172#discussioncomment-10680048

Best, Vadim

Soulaimane-Aboulouard commented 1 month ago

Dear Vadim,

I have tested the new version of DIA-NN 1.9.2 on a Linux HPC cluster, but I am still encountering the same error. The matrices cannot be retrieved. How can I resolve this issue?

Below the script:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --time=8:00:00
#SBATCH --job-name=DIANN_SOULAIMANE
#SBATCH --mem=100G
#SBATCH --output=post_%A_%a.out
#SBATCH --mail-type=ALL

module load diann/1.9.2

diann --dir "/home/soulaimane.aboulouard/DIANN-2/Sample" \
   --lib "/home/soulaimane.aboulouard/DIANN-2/Library/DIA-Human-library-20420seq-092024-ox-carba-192.predicted.speclib" \
   --threads 24 \
   --verbose 1 \
   --out "/home/soulaimane.aboulouard/DIANN-2/Resultat/report.tsv" \
   --qvalue 0.01 \
   --matrices \
   --fasta "/home/soulaimane.aboulouard/DIANN-2/Library/Uniprot-Human-Reviewed-20420seq-092024.fasta" \
   --cut K*,R* \
   --missed-cleavages 2 \
   --var-mods 3 \
   --var-mod UniMod:35,15.994915,M \
   --unimod4 \
   --mass-acc 15 \
   --mass-acc-ms1 10 \
   --use-quant \
   --peptidoforms \
   --reanalyse \
   --relaxed-prot-inf \
   --rt-profiling \
   --high-acc \

echo "Job (post) done"

Below is the log file:

DIA-NN 1.9.2 (Data-Independent Acquisition by Neural Networks)
Compiled on Oct 20 2024 02:59:53
Current date and time: Wed Oct 23 16:33:39 2024
Logical CPU cores: 24
Thread number set to 24
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
In silico digest will involve cuts at K*,R*
Maximum number of missed cleavages set to 2
Maximum number of variable modifications set to 3
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Cysteine carbamidomethylation enabled as a fixed modification
Existing .quant files will be used
Peptidoform scoring enabled
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
Heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers, GO/pathway and system-scale analyses
The spectral library (if generated) will retain the original spectra but will include empirically-aligned RTs
High accuracy quantification mode enabled
Mass accuracy will be fixed to 1.5e-05 (MS2) and 1e-05 (MS1)
WARNING: combining reuse of .quant files with automatic optimisation of mass accuracies or scan window will lead to results that are different from those of the original analysis that produced the .quant files and is strongly not recommended
The following variable modifications will be scored: UniMod:35 

36 files will be processed
[0:00] Loading spectral library /home/soulaimane.aboulouard/DIANN-2/Library/DIA-Human-library-20420seq-092024-ox-carba-192.predicted.speclib
[0:14] Library annotated with sequence database(s): /home/soulaimane.aboulouard/DIANN-2/Library/Uniprot-Human-Reviewed-20420seq-092024.fasta
[0:20] Spectral library loaded: 42476 protein isoforms, 70172 protein groups and 7869513 precursors in 3139016 elution groups.
[0:20] Loading protein annotations from FASTA /home/soulaimane.aboulouard/DIANN-2/Library/Uniprot-Human-Reviewed-20420seq-092024.fasta
[0:21] Annotating library proteins with information from the FASTA database
[0:21] Gene names missing for some isoforms
[0:21] Library contains 20400 proteins, and 20199 genes
[0:32] Initialising library

First pass: generating a spectral library from DIA data
[0:59] Cross-run analysis
[0:59] Reading quantification information: 36 files
[1:14] Quantifying peptides
[1:41] Assembling protein groups
[1:45] Quantifying proteins
[1:46] Calculating q-values for protein and gene groups
[1:58] Calculating global q-values for protein and gene groups
[1:59] Protein groups with global q-value <= 0.01: 6546
[2:06] Compressed report saved to /home/soulaimane.aboulouard/DIANN-2/Resultat/report-first-pass.parquet. Use R 'arrow' or Python 'PyArrow' package to process
[2:06] Writing report
[2:46] Report saved to /home/soulaimane.aboulouard/DIANN-2/Resultat/report-first-pass.tsv.
[2:46] Saving precursor levels matrix
[2:47] Precursor levels matrix (1% precursor and protein group FDR) saved to /home/soulaimane.aboulouard/DIANN-2/Resultat/report-first-pass.pr_matrix.tsv.
[2:47] Manifest saved to /home/soulaimane.aboulouard/DIANN-2/Resultat/report-first-pass.manifest.txt
[2:47] Stats report saved to /home/soulaimane.aboulouard/DIANN-2/Resultat/report-first-pass.stats.tsv
[2:47] Generating spectral library:
[2:49] 53905 target and 583 decoy precursors saved
[2:49] Spectral library saved to /home/soulaimane.aboulouard/DIANN-2/Resultat/report-lib.parquet

[2:50] Loading spectral library /home/soulaimane.aboulouard/DIANN-2/Resultat/report-lib.parquet
IOError: Error reading bytes from file. Detail: [errno 14] Bad address
/var/spool/slurm/d/job1158213/slurm_script: line 34: 4053049 Segmentation fault      (core dumped) diann --dir "/home/soulaimane.aboulouard/DIANN-2/Sample" --lib "/home/soulaimane.aboulouard/DIANN-2/Library/DIA-Human-library-20420seq-092024-ox-carba-192.predicted.speclib" --threads 24 --verbose 1 --out "/home/soulaimane.aboulouard/DIANN-2/Resultat/report.tsv" --qvalue 0.01 --matrices --fasta "/home/soulaimane.aboulouard/DIANN-2/Library/Uniprot-Human-Reviewed-20420seq-092024.fasta" --cut K*,R* --missed-cleavages 2 --var-mods 3 --var-mod UniMod:35,15.994915,M --unimod4 --mass-acc 15 --mass-acc-ms1 10 --use-quant --peptidoforms --reanalyse --relaxed-prot-inf --rt-profiling --high-acc
Job (post) done

Thank you for your help.

Best regards, Soulaimane

vdemichev commented 1 month ago

Hi Soulaimane,

It's definitely a different error, as you can see there's some problem with loading the newly saved library. Can you please share the /home/soulaimane.aboulouard/DIANN-2/Resultat/report-lib.parquet and I will take a look?

Best, Vadim

Soulaimane-Aboulouard commented 1 month ago

Hi Vadim,

Here is the report-lib.parquet report-lib.zip

Thank you Soulaimane

vdemichev commented 1 month ago

Works fine on my PC. Could it be that for some reason that file is not accessible for reading? If you try it again, does it work?

Soulaimane-Aboulouard commented 1 month ago

It doesn’t work, even after restarting. However, if I retrieve the quant files on Windows and run the analysis through the software interface, it works.

On Linux, it doesn’t work, even though file permissions are correctly set.

Soulaimane-Aboulouard commented 1 month ago

Here is the answer from the university's HPC cluster manager:

_Hello Soulaimane,

Do you think you can help me solve this problem?

Unfortunately, no.

As you say, the file /home/soulaimane.aboulouard/DIANN-3/Resultat/report-lib.parquet

is created and you have the necessary rights to read it.

Installing diann simply consists of depositing the executable https://github.com/vdemichev/DiaNN/releases/download/1.9.2/diann-1.9.2.Linux.zip on the cluster. We do not have the sources to compile it on Linux.

If your inputs are well-formed, it may be a bug in diann. In this case, you should contact the diann developers.

Have a nice day,

Jan_