vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
256 stars 53 forks source link

Generating spectral library with PTMs #839

Open buijt opened 10 months ago

buijt commented 10 months ago

Hello,

I am trying to create an in-silico predicted spectral library in DIA-NN and I am interested in detecting variable carbamylation of lysine (K) and arginine (R), as well as the carbamylation at N-terminus (variable). Specifically, I would like to simulate UniMod 5 and use this library to identify carbamylation in some human proteome data.

I looked at the command-line syntax that is provided for some of the PTMs built into DIA-NN, and determined that I should specify the modification like this when running DIA-NN:

--var-mod UniMod:5,43.005814,KR*n --monitor-mod UniMod:5

However, when I used the generated library to search the data, there were absolutely no peptide sequences tagged with UniMod:5, and when I converted the library to TSV and inspected the library, there are no spectra simulated with UniMod:5.

I suspected that perhaps the syntax specifying KR*n was problematic, so I also tried to only run:

--var-mod UniMod:5,43.005814,KR --monitor-mod UniMod:5

but the library still does not report any modified peptides containing UniMod:5.

Could you help me determine what went wrong with regards to PTM simulation? How can I define additional PTMs beyond the ones available in the DIA-NN GUI and have them be included in the in-silico predicted spectral library?

Thank you as well for developing and maintaining the DIA-NN software. I look forward to your reply.

DIA-NN Log:

DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks)
Compiled on Apr 15 2022 08:45:18
Current date and time: Mon Oct 16 16:43:51 2023
Logical CPU cores: 56
/usr/diann/1.8.1/diann --out-lib 202304_humanpredicted_diannlib_ptms.tsv --gen-spec-lib --predictor --fasta /common/buij4/library/uniprot_human_20230424.fasta --fasta-search --min-fr-mz 200 --max-fr-mz 1800 --min-pr-mz 400 --max-pr-mz 1000 --min-pr-charge 1 --max-pr-charge 4 --missed-cleavages 2 --max-pep-len 30 --min-pep-len 7 --cut K*,R* --met-excision --fixed-mod UniMod:4,57.021464,C --var-mod UniMod:35,15.994915,M --var-mod UniMod:5,43.005814,KR*n --var-mods 3 --smart-profiling --reanalyse --report-lib-info --qvalue 0.01 --threads 24 --verbose 1 --temp /common/buij4/library/temp/

A spectral library will be generated
Deep learning will be used to generate a new in silico spectral library from peptides provided
Library-free search enabled
Min fragment m/z set to 200
Max fragment m/z set to 1800
Min precursor m/z set to 400
Max precursor m/z set to 1000
Min precursor charge set to 1
Max precursor charge set to 4
Maximum number of missed cleavages set to 2
Max peptide length set to 30
Min peptide length set to 7
In silico digest will involve cuts at K*,R*
N-terminal methionine excision enabled
Modification UniMod:4 with mass delta 57.0215 at C will be considered as fixed
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:5 with mass delta 43.0058 at KR*n will be considered as variable
Maximum number of variable modifications set to 3
When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones
A spectral library will be created from the DIA runs and used to reanalyse them; .quant files will only be saved to disk during the first step
Output will be filtered at 0.01 FDR
Thread number set to 24
Exclusion of fragments shared between heavy and light peptides from quantification is not supported in FASTA digest mode - disabled; to enable, generate an in silico predicted spectral library and analyse with this library
WARNING: MBR turned off, two or more raw files are required

0 files will be processed
[0:00] Loading FASTA /common/buij4/library/uniprot_human_20230424.fasta
[0:16] Processing FASTA
[1:35] Assembling elution groups
[2:29] 28821789 precursors generated
[2:29] Gene names missing for some isoforms
[2:29] Library contains 20401 proteins, and 20183 genes
[2:33] [3:14] [14:50] [16:39] [16:56] [17:15] Saving the library to 202304_humanpredicted_diannlib_ptms.predicted.speclib
[18:10] Initialising library

Finished
vdemichev commented 10 months ago

In silico predictor will skip modifications on which it has not been trained, unless --strip-unknown-mods is added to 'Additional options'.

buijt commented 10 months ago

Thank you for your reply. Just to clarify. The readme in the command-line options section says:

--strip-unknown-mods instructs DIA-NN to ignore modifications that are not supported by the deep learning predictor, when performing the prediction

If I understand this properly, then using this option will not allow me to simulate arbitrary modifications in the in-silico library.

In my original post, I had not enabled this option, so DIA-NN tried to simulate all the (unknown) modifications that I specified. As it says in the logs, DIA-NN tries to simulate

Modification UniMod:4 with mass delta 57.0215 at C will be considered as fixed
Modification UniMod:35 with mass delta 15.9949 at M will be considered as variable
Modification UniMod:5 with mass delta 43.0058 at KR*n will be considered as variable

However, the resulting library contains 0 peptides labeled with UniMod:5, which means that DIA-NN failed to simulate this PTM.

Am I understanding the software correctly? Thanks in advance.