vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
266 stars 53 forks source link

0 proteins identified with high confidence #848

Open lmin-2023 opened 11 months ago

lmin-2023 commented 11 months ago

Dear Vadim,

I have some issues with identifying high confidence proteins from SWATH data acquired on ZenoTOF 7600.

I am using fasta predicted library. The SWATH data should contain about ~100 proteins. When I initially used a predicted spectra library from the whole mouse proteome, DIANN got stuck with processing. Then I tried to only use part of the proteome to build a spectral library. I made sure the partial fasta data base contains proteins in the samples. With the smaller spectral library, DIANN finished processing the data without generating any error messages. But Protein.Q.Value, PG.Q.Value, and Global.PG.Q.Value are all above 0.01 although majority of Q.Value and Global.Q.Value for precursors are below 0.01. I also tried --relaxed-prot-inf option. It did not help. I looked the data in Skyline and it looked normal.

Can you suggest what I can try?

Many thanks in advance for your help, Lie

  Here is the log file:

diann.exe --f "Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\Smp_SW_10242023_10.wiff " --lib "Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\Smp_lib_predicted.predicted.speclib" --threads 8 --verbose 1 --out "Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.tsv" --qvalue 0.01 --matrices --out-lib "Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\outputLib_test.tsv" --gen-spec-lib --predictor --reanalyse --relaxed-prot-inf --smart-profiling --pg-level 0 --peak-center --no-ifs-removal --report-lib-info --relaxed-prot-inf DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 14 2022 15:31:19 Current date and time: Fri Oct 27 15:39:37 2023 CPU: GenuineIntel Intel(R) Core(TM) i7-10700T CPU @ 2.00GHz SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2 Logical CPU cores: 16 Thread number set to 8 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report A spectral library will be generated Deep learning will be used to generate a new in silico spectral library from peptides provided A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Implicit protein grouping: isoform IDs; this determines which peptides are considered 'proteotypic' and thus affects protein FDR calculation Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme. WARNING: MBR turned off, two or more raw files are required

1 files will be processed [0:00] Loading spectral library Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\Smp_lib_predicted.predicted.speclib [0:00] Library annotated with sequence database(s): Z:\ZT7600Data\SWATHtest\Smp\20231025\Smp\Smp.fa [0:00] Spectral library loaded: 107 protein isoforms, 117 protein groups and 20972 precursors in 6564 elution groups. [0:00] Encoding peptides for spectra and RTs prediction [0:00] Predicting spectra and IMs [0:06] Predicting RTs [0:07] Decoding predicted spectra and IMs [0:07] Decoding RTs [0:07] Saving the library to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\outputLib_test.predicted.speclib [0:08] Initialising library

[0:08] File #1/1 [0:08] Loading run Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\Smp_SW_10242023_10.wiff [0:37] 15017 library precursors are potentially detectable [0:37] Processing... [0:50] RT window set to 2.21659 [0:50] Peak width: 4.788 [0:50] Scan window radius set to 10 [0:50] Recommended MS1 mass accuracy setting: 4.94531 ppm [1:03] Optimised mass accuracy: 23.1392 ppm [1:05] Removing low confidence identifications [1:05] Removing interfering precursors [1:06] Training neural networks: 2898 targets, 2280 decoys [1:06] Number of IDs at 0.01 FDR: 762 [1:06] Calculating protein q-values [1:06] Number of protein isoforms identified at 1% FDR: 99 (precursor-level), 0 (protein-level) (inference performed using proteotypic peptides only) [1:06] Quantification [1:06] Quantification information saved to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\Smp_SW_10242023_10.wiff.quant.

[1:07] Cross-run analysis [1:07] Reading quantification information: 1 files [1:07] Quantifying peptides [1:07] Assembling protein groups [1:07] Quantifying proteins [1:07] Calculating q-values for protein and gene groups [1:07] Calculating global q-values for protein and gene groups [1:07] Writing report [1:07] Report saved to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.tsv. [1:07] Saving precursor levels matrix [1:07] Precursor levels matrix (1% precursor and protein group FDR) saved to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.pr_matrix.tsv. [1:07] Saving protein group levels matrix [1:07] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.pg_matrix.tsv. [1:07] Saving gene group levels matrix [1:07] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.gg_matrix.tsv. [1:07] Saving unique genes levels matrix [1:07] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.unique_genes_matrix.tsv. [1:07] Stats report saved to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.stats.tsv [1:07] Generating spectral library: [1:07] 762 precursors passing the FDR threshold are to be extracted [1:07] Loading run Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\Smp_SW_10242023_10.wiff [1:38] 15017 library precursors are potentially detectable [1:38] 198 spectra added to the library [1:38] Saving spectral library to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\outputLib_test.tsv [1:39] 762 precursors saved [1:39] Loading the generated library and saving it in the .speclib format [1:39] Loading spectral library Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\outputLib_test.tsv [1:39] Spectral library loaded: 99 protein isoforms, 100 protein groups and 762 precursors in 578 elution groups. [1:39] Protein names missing for some isoforms [1:39] Gene names missing for some isoforms [1:39] Library contains 0 proteins, and 0 genes [1:39] Saving the library to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\outputLib_test.tsv.speclib [1:39] Log saved to Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.log.txt Finished

DIA-NN exited DIA-NN-plotter.exe "Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.stats.tsv" "Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.tsv" "Z:\ZT7600Data\SWATHtest\Smp\2023_1025\Smp\test\report.pdf" PDF report will be generated in the background

vdemichev commented 11 months ago

Hi Lie,

The solution will be to use the main report and filter it at a more relaxed q-value like 5%. Another option is to make sure the FASTA includes common contaminants and any other kind of background proteins, ideally DIA-NN should search the data against everything that is present in the samples.

Best, Vadim