vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

Custom PTM in silico library generation #1207

Open Shinya-Watanabe opened 1 month ago

Shinya-Watanabe commented 1 month ago

Hi Vadim,

I have been using Dia-NN through FragPipe to analyze custom PTMs (one is in UniMod but the other is not). I managed to run it but PTMProphet has to be disabled because it has an issue with custom PTMs on protein c-terminus (https://github.com/Nesvilab/FragPipe/issues/1815). I assume disabling PTMProphet caused no PTM localization probability columns in report files. Also, Dia-NN in FragPipe may not be able to run with library free search, so I tried to use Dia-NN GUI to generate in silico library with custom PTMs. It does not generate the library unless deep-leaning based prediction is on, but I read somewhere in the issues that it does not support custom PTMs.

1) Can you tell me how to generate in silico spectra library? 2) When I analyze .d files using that library, what check boxes do I need to turn off?

10_10_2024 09_47_01.txt Screenshot 2024-10-10 103959

vdemichev commented 1 month ago

Hi,

It does not generate the library unless deep-leaning based prediction is on

Yes, it must always be on when generating an in silico lib.

but I read somewhere in the issues that it does not support custom PTMs

The performance is suboptimal with PTMs it has not been trained on. It still works though.

When I analyze .d files using that library, what check boxes do I need to turn off?

MBR & 'Generate spectral library' should be selected, everything else should be default.

Btw, please note that on the screenshot --var-mod is used with incorrect syntax, it will not understand the meaning of '*c'

Best, Vadim

Shinya-Watanabe commented 1 month ago

Thank you, Vadim.

I will try with that. Can you provide me with the accurate syntax for protein c terminal modification? I modified the example on the documentation "--var-mod UniMod:1,42.010565,*n" Also, I cannot find syntax for command-line options. There is a list of --command-line options but I do not know what can go in there.

Best, Shinya

vdemichev commented 1 month ago

for protein c terminal modification?

Not supported in DIA-NN at the moment unfortunately. Thank you for pointing this out, I've added this to the todo list.

Shinya-Watanabe commented 1 month ago

Thank you! Looking forward to it.

If that is the case, I have strange data. When I used Fragpipe for this. I got protein c terminal modification peptides (Please see attached, I filtered "Modified.Sequence" by ending with "734)". The modification was searched on c-terminal protein, D, or E, then anything modified with UniMod:734 at peptide c-terminus other than DE is protein c-terminus). Is this because I generated spectra library using FragPipe?

report.pr_matrix.xlsx report.log.txt

vdemichev commented 1 month ago

I can't really comment on FragPipe algorithms (better ask FragPipe team, they usually reply super helpful on github), but with regard to the spectral library generated by FragPipe - you can just examine it in R or Python, it's a simple text table (library.tsv), can see what is there - DIA-NN just searches the peptides in the library.

Shinya-Watanabe commented 1 month ago

I just quick look at the library.tsv generated by FragPipe, and it contains peptide with modifications on protein c-terminus. I will try if FragPipe can make in silico spectra library with protein c-terminal modification, so I can use it to run Dia-NN for it.

Thank you for your help, Vadim! Best, Shinya

vdemichev commented 1 month ago

I just quick look at the library.tsv generated by FragPipe, and it contains peptide with modifications on protein c-terminus

So does searching with it using DIA-NN produce an expected result?

Shinya-Watanabe commented 1 month ago

Yes, I got somewhat expected result (I got some peptide modified at protein c-terminus) but not the best (missing PTM probability info) probably due to PTMProphet. I attached Fragpipe log. However, PTMProphet also has an issue with searching modification on protein c-terminus. I asked this to Fragpipe team (https://github.com/Nesvilab/FragPipe/issues/1815), and they are working on this. I assume the Dia-NN output (e.g., report_pr_matrix.tsv) does not contain PTM probability because of disabled PTMProphet.

Let me know if you want me to test something or provide you my dataset.

log_2024-10-06_07-22-52.txt

vdemichev commented 1 month ago

does not contain PTM probability because of disabled PTMProphet.

Need to use --peptidoforms, then DIA-NN will produce peptidoform q-values. For localisation, need declare the modifications with --var-mod.

Best, Vadim

Shinya-Watanabe commented 1 month ago

1) I tried --peptidoforms in "cmd line opts" in DIA-NN in FragPipe, and it returned an error"WARNING: unrecognised option [--peptidoforms]" 2) "--var-mod" in DIA-NN in FragPipe did not produce localization probability columns in the output. 2) I tried DIA-NN GUI with "library.tsv" created by Fragpipe. Unknown modification error caused termination of the process. ERROR: D:\diann\src\diann.cpp: 4064: unknown modification: 216.07462 I did not add modification mass of 216.07462 when creating the specta library, but it is in library.tsv. It may be produced by FragPipe for some reason.

DIA-NN_bBJ7Kr3Pe8
vdemichev commented 1 month ago

This happens because FragPipe packages an old DIA-NN version. Solutions:

I did not add modification mass of 216.07462 when creating the specta library, but it is in library.tsv. It may be produced by FragPipe for some reason.

Can just declare it with --mod

Shinya-Watanabe commented 1 month ago

Use the library generated by FragPipe in DIA-NN 1.9.1.

This worked. I downloaded DIA-NN 1.9.1. and switched with old DIA-NN in FragPipe.

Use --monitor-mod UniMod:734 --monitor-mod PS instead of --peptidoforms (--var-mod also needs to be specified).

Thank you. I will try this too, but I figured out that modification name in the spectral library created by FragPipe is named either UniMod:XXX (if the mass matches in the UniMod) or the mass itself (e.g., 87.03203). Thus, I needed to declare --mod 87.03203 or --var-mod 87.03203,87.03203,DE. Also, if the modification mass is not in UniMod, FragPipe creates library.tsv with amino acid + mod mass as mod name (e.g., for E, 216.07462; for D, 202.05896; for C-term, 87.026746). Here's examples.

E[216.07462]VAGAKPHITAAEGK
AENLGGPGAGAGTLAGKDA.[87.026746]
E(UniMod:734)VD[202.05896]ATSPAPSTSSTVK

Best, Shinya

vdemichev commented 1 month ago

Hi Shinya,

Yes, it's fine if the name is different. DIA-NN accepts arbitrary strings in parentheses ([ ] or ( )) as the modificaiton names, just important to let DIA-NN know about these using either of --mod, --var-mod or --fixed-mod.

Best, Vadim

Shinya-Watanabe commented 1 month ago

Thank you, Vadim! I am looking forward to the function to recognize protein c-terminal modifications.

Best, Shinya