vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
261 stars 53 forks source link

Spectral library generation from fasta failed #817

Open KathiBio opened 11 months ago

KathiBio commented 11 months ago

I tried to generate a spectral library from a .fasta file but after 13 min DIA-NN existed with no error in the log file: diann.exe --lib "" --threads 16 --verbose 1 --out "D:\SWATH libraries\MicrobialDB\SpecLib_generation.tsv" --qvalue 0.01 --matrices --out-lib "D:\SWATH libraries\MicrobialDB\Microbiome.tsv" --gen-spec-lib --predictor --fasta "D:\SWATH libraries\Microbiome.fasta" --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --met-excision --cut K,R --missed-cleavages 1 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 300 --max-pr-mz 1800 --min-pr-charge 2 --max-pr-charge 4 --unimod4 --var-mods 1 --var-mod UniMod:35,15.994915,M --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 14 2022 15:31:19 Current date and time: Wed Oct 4 10:03:15 2023 CPU: GenuineIntel Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz SIMD instructions: AVX AVX2 AVX512CD AVX512F FMA SSE4.1 SSE4.2 Logical CPU cores: 32 Thread number set to 16 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided Library-free search enabled Min fragment m/z set to 200 Max fragment m/z set to 1800 N-terminal methionine excision enabled In silico digest will involve cuts at K,R Maximum number of missed cleavages set to 1 Min peptide length set to 7 Max peptide length set to 30 Min precursor m/z set to 300 Max precursor m/z set to 1800 Min precursor charge set to 2 Max precursor charge set to 4 Cysteine carbamidomethylation enabled as a fixed modification Maximum number of variable modifications set to 1 Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library

0 files will be processed [0:00] Loading FASTA D:\SWATH libraries\Microbiome.fasta [13:32] Processing FASTA

DIA-NN exited

The fasta headers are not from UniProt. Could that be a problem?

SEQF1003_00024 Translation initiation factor IF-2 [HMT-750 Lancefieldella rimae ATCC 49626] and the DB is quite big (2,000,000 KB).

Hope you can help me here.

Best, Katharina

vdemichev commented 11 months ago

Hi Katharina,

Could it be an out of memory error? First thing to try: part of the FASTA (copy-pasted in Notepad) to see if this is purely a RAM issue.

Best, Vadim

KathiBio commented 11 months ago

This seems to be the issue. I tried it with a very small subset and it worked. Is it possible to create a library in parts?

vdemichev commented 11 months ago

Yes, can generate .predicted.speclib from parts, convert each to .tsv and then load multiple tsv files. But I am not sure if this will be more memory-efficient.

vdemichev commented 11 months ago

Could be also maybe some unknown symbol in the FASTA in one of the entries?