vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
284 stars 53 forks source link

Running Bruker .d files in Linux cluster #812

Open camdouglas opened 1 year ago

camdouglas commented 1 year ago

Hello,

I wish to move my data analysis to a cluster environment to expedite data analysis due to backlogged data analysis on my local device. I am attempting to run DIANN on my institutions Linux cluster. I am using .d data from a Bruker TimsTOF Pro2, and I keep getting the following error message: "ERROR: either the dia-PASEF file is damaged or a .dia file produced by an older DIA-NN version has been loaded: performance might be suboptimal, regenerate the .dia file using this DIA-NN version"

The files on the cluster have not been manipulated or edited, so I believe that I am executing the command incorrectly.

I am wondering if anyone has any advice on how to resolve this error. I have attached the output report below. Please let me know if you have recommendations. Thank you very much.

DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 15 2022 08:45:18 Current date and time: Fri Sep 29 16:28:18 2023 Logical CPU cores: 64 ./diann-1.8.1 --dir /blue/cseath/camerondouglas/06.26.2023_HDACi_data --lib /blue/cseath/camerondouglas/diann_files/report-lib.predicted.speclib --threads 16 --verbose 1 --out /blue/cseath/camerondouglas/06.26.2023_HDACi_data/report_out.tsv --qvalue 0.01 --matrices --out-lib /blue/cseath/camerondouglas/06.26.2023_HDACi_data/spec_lib_out.tsv --gen-spec-lib --reannotate --fasta /blue/cseath/camerondouglas/diann_files/uniprot-download_true_format_fasta_query__2A_20AND_20_28model_organi-2022.08.08-18.40.51.64.fasta --met-excision --cut K,R --missed-cleavages 3 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 1 --max-pr-charge 4 --unimod4 --var-mods 3 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,*n --monitor-mod UniMod:1 --mass-acc 15 --mass-acc-ms1 15 --use-quant --reanalyse --smart-profiling --peak-center --no-ifs-removal

Thread number set to 16 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Library precursors will be reannotated using the FASTA database N-terminal methionine excision enabled In silico digest will involve cuts at K,R Maximum number of missed cleavages set to 3 Min peptide length set to 7 Max peptide length set to 30 Min precursor m/z set to 300 Max precursor m/z set to 1800 Min precursor charge set to 1 Max precursor charge set to 4 Cysteine carbamidomethylation enabled as a fixed modification Maximum number of variable modifications set to 3 Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable Modification UniMod:1 with mass delta 42.0106 at *n will be considered as variable Existing .quant files will be used A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Mass accuracy will be fixed to 1.5e-05 (MS2) and 1.5e-05 (MS1) The following variable modifications will be scored: UniMod:1

18 files will be processed [0:00] Loading spectral library /blue/cseath/camerondouglas/diann_files/report-lib.predicted.speclib [0:02] Library annotated with sequence database(s): C:\Users\ciaran\Desktop\FASTA FILES\uniprot-download_true_format_fasta_query__2A_20AND_20_28model_organi-2022.08.08-18.40.51.64.fasta [0:03] Spectral library loaded: 20373 protein isoforms, 29157 protein groups and 4293202 precursors in 1336696 elution groups. [0:03] Loading FASTA /blue/cseath/camerondouglas/diann_files/uniprot-download_true_format_fasta_query__2A_20AND_20_28model_organi-2022.08.08-18.40.51.64.fasta [0:27] Reannotating library precursors with information from the FASTA database [0:31] Finding proteotypic peptides (assuming that the list of UniProt ids provided for each peptide is complete) [0:31] 4293202 precursors generated [0:31] Gene names missing for some isoforms [0:31] Library contains 20373 proteins, and 20155 genes [0:33] Initialising library

[0:37] First pass: generating a spectral library from DIA data [0:37] File #1/18 [0:37] Loading run /blue/cseath/camerondouglas/06.26.2023_HDACi_data/MS223061_CS_CD_Romidepsin_4_Slot2-19_1_2535.d ERROR: either the dia-PASEF file is damaged or a .dia file produced by an older DIA-NN version has been loaded: performance might be suboptimal, regenerate the .dia file using this DIA-NN version WARNING: incorrectly recorded isolation window margins [0:37] 0 library precursors are potentially detectable [0:37] Processing... [0:37] Removing low confidence identifications [0:37] Searching PTM decoys [0:37] Removing interfering precursors [0:37] Too few confident identifications, neural networks will not be used [0:37] Number of IDs at 0.01 FDR: 0 [0:37] Calculating protein q-values [0:37] Number of genes identified at 1% FDR: 0 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [0:37] Quantification [3:11] Quantification information saved to /blue/cseath/camerondouglas/06.26.2023_HDACi_data/MS223061_CS_CD_Romidepsin_4_Slot2-19_1_2535.d.quant.

report_out.log.txt

vdemichev commented 1 year ago

I would try also with