Open santoshdbhosale opened 2 years ago
Hi Santosh,
No, once you obtained a predicted spectral library (.predicted.speclib), FASTA digest or deep learning spectra prediction should not be used again.
Yes, speed like this is fine for lib-free search of dia-PASEF. I cannot see what kind of PC you have, but on 8-cores that would be something expected.
Best, Vadim
Hi Vadim,
Thank you for the quick response. I was wondering to generate the refined library using FASTA predicated (.predicted.speclib) when the real samples dia-PASEF files are loaded. Does this make any sense at all? Thanks, Santosh
This is automatic with MBR
But if you see the above screenshot, the option of generate spectral library is not enabled.
Means it was 'unclicked'. Checking MBR automatically enables it.
So should I restart the search again
Yes
Okay. Thank you. So, do you recommend to search the .d files again with the refined spectral library (builded out of dia files)? Thanks, Santosh
The easiest is to generate an in silico predicted spectral library and process the entire experiment with it with MBR enabled.
Hi Vadim, I am experiencing 2 issues:
here is the command line: diann.exe --f "M:\sciex\swath data\O4_5uL_7sep22.wiff " --f "M:\sciex\swath data\O2_5uL_7sep22.wiff " --lib "" --threads 16 --verbose 2 --out "M:\DIA-NN\5uL_O2_O4\report.tsv" --qvalue 0.01 --matrices --temp "M:\DIA-NN\5uL_O2_O4\tmp" --out-lib "M:\DIA-NN\5uL_O2_O4" --gen-spec-lib --predictor --fasta "M:\our fuckin data\fasta_files\uniprot_mus_musculus_filtered_reviewed.fasta" --fasta-search --min-fr-mz 100 --max-fr-mz 1600 --met-excision --cut K,R,!P --missed-cleavages 2 --min-pep-len 7 --max-pep-len 30 --min-pr-mz 400 --max-pr-mz 1250 --min-pr-charge 1 --max-pr-charge 5 --unimod4 --var-mods 2 --var-mod UniMod:35,15.994915,M --var-mod UniMod:1,42.010565,n --monitor-mod UniMod:1 --var-mod UniMod:21,79.966331,STY --monitor-mod UniMod:21 --var-mod UniMod:121,114.042927,K --monitor-mod UniMod:121 --no-cut-after-mod UniMod:121 --mass-acc 15 --mass-acc-ms1 15 --double-search --reanalyse --relaxed-prot-inf --smart-profiling --no-ifs-removal --no-norm
and here is the log output so far:
Current date and time: Thu Sep 8 22:53:14 2022 CPU: GenuineIntel Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz SIMD instructions: AVX SSE4.1 SSE4.2 Logical CPU cores: 16 Thread number set to 16 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided Library-free search enabled Min fragment m/z set to 100 Max fragment m/z set to 1600 N-terminal methionine excision enabled In silico digest will involve cuts at K,R But excluding cuts at P Maximum number of missed cleavages set to 2 Min peptide length set to 7 Max peptide length set to 30 Min precursor m/z set to 400 Max precursor m/z set to 1250 Min precursor charge set to 1 Max precursor charge set to 5 Cysteine carbamidomethylation enabled as a fixed modification Maximum number of variable modifications set to 2 Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable Modification UniMod:1 with mass delta 42.0106 at n will be considered as variable Modification UniMod:21 with mass delta 79.9663 at STY will be considered as variable Modification UniMod:121 with mass delta 114.043 at K will be considered as variable Neural networks will be used for peak selection A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Interference removal from fragment elution curves disabled Normalisation disabled Mass accuracy will be fixed to 1.5e-05 (MS2) and 1.5e-05 (MS1) Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library The following variable modifications will be scored: UniMod:1 UniMod:21 UniMod:121 WARNING: double-pass mode is incompatible with PTM scoring, turned off DIA-NN will discard peptides obtained using in silico cuts after the following modifications: UniMod:121,
2 files will be processed [0:00] Loading FASTA M:\our fuckin data\fasta_files\uniprot_mus_musculus_filtered_reviewed.fasta [0:28] Processing FASTA [5:58] Assembling elution groups [11:06] 46170443 precursors generated [11:06] Gene names missing for some isoforms [11:06] Library contains 17066 proteins, and 16696 genes [11:21] Encoding peptides for spectra and RTs prediction [15:39] Predicting spectra and IMs
Total RAM usage of the machine is 31% (37GB), cpu usage stays around 13%.
Any idea what I am doing wrong ?
Regards, George
Hi George,
Yes, in silico prediction for 46 million precursors is slow. Why are you searching for both phospho & ubiquitin? The sample is enriched for both? Anyway, reducing precursor charge range to 2-3, and restricting precursor mass range to the actual range acquired in the runs will speed things up. NOT enabling M(ox) will also help. For phospho you'd probably want max 3 var mods, not 2.
Yes, 14Gb FASTA is not a good idea to search against probably :) Just too large to predict in silico. I would suggest to try to search against huge databases using FragPipe, which is integrated with DIA-NN.
Best, Vadim
Thank you Vadim :)
Thank you, Vadim.
I followed your advice, removed both phospho & ubiquitin and managed to get 2076 respective 2049 identified proteins from 2 runs 90 minute gradient, 5ug peptides each, mouse Myocyte whole cells total protein trypsin digest. Total dia-nn runtime was 97 minutes, log level 5 (I like to see what's going on!). Now I'll try longer gradient and then high pH fractionation. I run swath on triple tof 5600+ with 64 variable windows over a range of 400 - 1250 m/z. Also will try GFP. Is there an easy way to combine all the resulted libs in one ?
Regards, George
Hi George,
Can load multiple libraries into DIA-NN with multiple --lib commands. But need to make sure that RT scale in all of them is the same.
Best, Vadim
Hi Vadim,
I am currently analyzing the dia-PASEF data for serum samples. Following are the steps I used to do the analysis.
Thank you, Santosh