vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
263 stars 53 forks source link

Top1 method #1063

Open Giu-F opened 2 months ago

Giu-F commented 2 months ago

Hi, I would like to get phosphosite level information with the top1 method with a 0.75 cutoff, instead of the 0.9 or 0.99 used to generate the matrixes. Has the top1 method been implemented in the Diann R package?

vdemichev commented 2 months ago

Hi Giulia,

No, currently only 0.9 and 0.99, not implemented in the R package. This thing is quite untested (and not investigated in literature), I woud recommend to go with 0.9 then, for extra confidence. Also, typically, the number of IDs at 0.9 is almost the same as at 0.75.

For R, can try https://github.com/tvpham/msproteomics - it's specifically meant to process DIA-NN output (the main report).

Best, Vadim

Giu-F commented 2 months ago

Hi Vadim, Thanks for the reply. The package you linked has almost no explanation on how to use it… and it’s for python. I am working on a phospho SILAC dataset, and you recommend to use the report for analysis. So even if I go for 0.9, I still need to apply the top1 method on the main report. Any chance you guys are planning to implement it in Diann R package any time soon?

vdemichev commented 2 months ago

0.9 and 0.99 matrices are ready to use, i.e. no need to apply Top 1. But it's not channel-specific. So if you'd like to do phosphosite quant in channel-specific manner, you indeed need to write custom-scripts to do that. Should not be too difficult though. Yes, we likely implement this in the future, I will add it to the todo list, but most likely will not appear in the next update.

Giu-F commented 2 months ago

While making the custom-script, I noticed that the column "Site.Occupancy.Probabilities" matches the PTM.Site.Confidence, not the Lib.PTM.Site.Confidence. Does a column named "Lib.Site.Occupancy.Probabilities" exist? Are the matrixes filtered using these Site.Occupancy.Probabilities or the ones calculated after MBR (hypothetical column "Lib.Site.Occupancy.Probabilities")? Thanks. Br, Giulia

vdemichev commented 2 months ago

Hi Giulia,

Not sure what do you mean by a column matching another column? Lib.Site.Occupancy.Probabilities does not exist.

0.9 and 0.99 phospho matrices are filtered using Site.Occupancy.Probabilities.

Best, Vadim

Giu-F commented 2 months ago

What if you are using MBR? Shouldn't you filter on Lib.PTM.Site.Confidence? However, if you filter on Site.Occupancy.Probabilities, it's like filtering on PTM.Site.Confidence?

vdemichev commented 2 months ago

"Shouldn't you filter on Lib.PTM.Site.Confidence?" - not sure if this is necessary.

sooheon commented 1 month ago

Further detail needed to implement custom script -- how can one derive Site # info from the parquet output?

vdemichev commented 1 month ago

What do you mean? There’s a column that lists the positions of the sites within a protein, for each precursor identification. So each pair (protein, site position) is a unique site id. But you only need a custom script if the phosphosite matrices produced by DIA-NN are not suitable for your purposes.

sooheon commented 1 month ago

Sorry I'm not so familiar w/ the domain, but as far as I can tell Modeified.Sequence gives location of phosphorylation within the peptide, but not the index within the whole protein seq, i.e. Site:549

vdemichev commented 1 month ago

There’s another column with the site index, I don’t remember the column name (writing from phone), but should be easy to find. The column will have required info if you analysed with phospho specified as var mod

sooheon commented 1 month ago

There’s another column with the site index, I don’t remember the column name (writing from phone)

Ah thanks, it's Protein.Sites

you only need a custom script if the phosphosite matrices produced by DIA-NN are not suitable for your purposes.

It seems the cutoffs are quite conservative, https://www.nature.com/articles/s41467-022-35740-1 seems to report that 0.01 is fine general cutoff and even 0.51 is stringent, for example. For a given confidence DIA-NN has lower empirical FDR than Spectronaut.