vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

WARNING: 70655 precursors were wrongly annotated in the library as proteotypic #1148

Open Mnago opened 2 months ago

Mnago commented 2 months ago

Hello, I encountered an issue as described in the title.

Building Spectral Library using DIA-NN v1.9.1

I created a spectral library using DIA-NN 1.9.1 with the following parameters. The output file, report-lib.parquet, serves as the spectral library.

code diann.exe --f "E:\isoform_test\20240205_EV_24minDIA_30F_F1.raw.dia " --f "E:\isoform_test\20240205_EV_24minDIA_30F_F2.raw.dia " --lib "E:\isoform_test\report-lib.predicted.speclib" --threads 44 --verbose 1 --out "E:\isoform_test\report.tsv" --qvalue 0.01 --matrices --out-lib "E:\isoform_test\report-lib.parquet" --gen-spec-lib --prosit --unimod4 --var-mods 1 --var-mod UniMod:35,15.994915,M --use-quant --individual-mass-acc --individual-windows --no-prot-inf --peptidoforms --reanalyse --relaxed-prot-inf --rt-profiling

DIA Analysis using DIA-NN v1.9.1 The corresponding parameters are as follows, and a screenshot of the log file is included below.

code diann.exe --f "E:20240826_LungCa_24minDIA_51.raw

" --lib "D:\human_library_with_isoform\report-lib.parquet" --threads 64 --verbose 1 --out "F:\DIANN191_isoform_libray_based_plasma_serum_report20240828.tsv" --qvalue 0.01 --matrices --unimod4 --var-mods 1 --var-mod UniMod:35,15.994915,M --individual-mass-acc --individual-windows --no-prot-inf --peptidoforms --reanalyse --relaxed-prot-inf --rt-profiling ......

26 files will be processed [0:00] Loading spectral library D:\human_library_with_isoform\report-lib.parquet [0:04] Spectral library loaded: 37043 protein isoforms, 31019 protein groups and 158861 precursors in 142624 elution groups. [0:05] Initialising library [0:05] Saving the library to D:\human_library_with_isoform\report-lib.parquet.skyline.speclib

Why is there an issue with the log file? issue2 issue1

Thanks

vdemichev commented 2 months ago

Hi, thanks for reporting this. The warning occurs when the library was created using a different proteotypicity definition than the one used during the analysis. It seems that during library creation, if you set protein inference to off, it still annotates proteins based on the 'Genes' used as default, i.e. proteihn inference = off is ignored during library creation. On the other hand, during the analysis, you set 'protein inference = off' and this among other things switches proteotypicity to 'isoforms', which then leads to the warning. That being said, protein inference should be switched off only in one situation: when you are analysing with an empirical library that already contains protein groups. In all other cases it should be set to whatever proteotypicity definition best suits the experiment.

Best, Vadim

Mnago commented 2 months ago

Hi Vadim,

Thank you for your prompt response. Since my database construction process also includes isoforms, I would like to confirm a few things with you. Is it correct to choose "protein names (from fasta)" for protein inference in the following three steps?

  1. When generating the predicted library from the fasta file.
  2. When generating the spectral library by combining the predicted library with the nDIA file.
  3. Finally, when analyzing other DIA files using the spectral library.

Alternatively, in the second scenario:

  1. When generating the predicted library from the fasta file, choose "protein names (from fasta)" for protein inference.
  2. When generating the spectral library by combining the predicted library with the nDIA file, choose "protein names (from fasta)" for protein inference.
  3. Finally, when analyzing other DIA files using the spectral library, choose "off" for protein inference.

Are these choices correct for handling isoforms in both scenarios?

Thanks