Open Giu-F opened 2 months ago
Hi Giulia,
No, currently only 0.9 and 0.99, not implemented in the R package. This thing is quite untested (and not investigated in literature), I woud recommend to go with 0.9 then, for extra confidence. Also, typically, the number of IDs at 0.9 is almost the same as at 0.75.
For R, can try https://github.com/tvpham/msproteomics - it's specifically meant to process DIA-NN output (the main report).
Best, Vadim
Hi Vadim, Thanks for the reply. The package you linked has almost no explanation on how to use it… and it’s for python. I am working on a phospho SILAC dataset, and you recommend to use the report for analysis. So even if I go for 0.9, I still need to apply the top1 method on the main report. Any chance you guys are planning to implement it in Diann R package any time soon?
0.9 and 0.99 matrices are ready to use, i.e. no need to apply Top 1. But it's not channel-specific. So if you'd like to do phosphosite quant in channel-specific manner, you indeed need to write custom-scripts to do that. Should not be too difficult though. Yes, we likely implement this in the future, I will add it to the todo list, but most likely will not appear in the next update.
While making the custom-script, I noticed that the column "Site.Occupancy.Probabilities" matches the PTM.Site.Confidence, not the Lib.PTM.Site.Confidence. Does a column named "Lib.Site.Occupancy.Probabilities" exist? Are the matrixes filtered using these Site.Occupancy.Probabilities or the ones calculated after MBR (hypothetical column "Lib.Site.Occupancy.Probabilities")? Thanks. Br, Giulia
Hi Giulia,
Not sure what do you mean by a column matching another column? Lib.Site.Occupancy.Probabilities does not exist.
0.9 and 0.99 phospho matrices are filtered using Site.Occupancy.Probabilities.
Best, Vadim
What if you are using MBR? Shouldn't you filter on Lib.PTM.Site.Confidence? However, if you filter on Site.Occupancy.Probabilities, it's like filtering on PTM.Site.Confidence?
"Shouldn't you filter on Lib.PTM.Site.Confidence?" - not sure if this is necessary.
Further detail needed to implement custom script -- how can one derive Site # info from the parquet output?
What do you mean? There’s a column that lists the positions of the sites within a protein, for each precursor identification. So each pair (protein, site position) is a unique site id. But you only need a custom script if the phosphosite matrices produced by DIA-NN are not suitable for your purposes.
Sorry I'm not so familiar w/ the domain, but as far as I can tell Modeified.Sequence gives location of phosphorylation within the peptide, but not the index within the whole protein seq, i.e. Site:549
There’s another column with the site index, I don’t remember the column name (writing from phone), but should be easy to find. The column will have required info if you analysed with phospho specified as var mod
There’s another column with the site index, I don’t remember the column name (writing from phone)
Ah thanks, it's Protein.Sites
you only need a custom script if the phosphosite matrices produced by DIA-NN are not suitable for your purposes.
It seems the cutoffs are quite conservative, https://www.nature.com/articles/s41467-022-35740-1 seems to report that 0.01 is fine general cutoff and even 0.51 is stringent, for example. For a given confidence DIA-NN has lower empirical FDR than Spectronaut.
Hi, I would like to get phosphosite level information with the top1 method with a 0.75 cutoff, instead of the 0.9 or 0.99 used to generate the matrixes. Has the top1 method been implemented in the Diann R package?