vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
239 stars 50 forks source link

Recommendations on DIA-NN Phosphorylation Reports #1079

Open crf2213 opened 1 week ago

crf2213 commented 1 week ago

Hi Thanks for your work in DiaNN. I'm currently using the latest version of DIANN to process phosphoproteomics data, and I have obtained wonderful results. Which include tables of phosphorylation sites with confidence exceeding 0.99 and 0.9 respectively. These results have greatly facilitated our work. However, I would like to integrate these two sets of results into one table. And then add a column for phosphorylation site confidence scores. For datasets with multiple parallel samples, the highest scoring phosphorylation site can represent the probability of that site across all samples, Thus, we can easily filter reliable phosphorylation site data based on custom criteria. Such as the table below:

图片1 I'd be grateful if you could take this into consideration!

vdemichev commented 1 week ago

Hi,

You can obtain the full phosphosite information from the main report in .parquet format, i.e. confidence levels for individual phosphosites and information on their location within the protein. It just requires an R or Python script to extract it and make a nice table. This is already done automatically at 0.9 and 0.99 site confidence levels, and using one of these is a good idea for most experiments. But if you need more information reported, please just use the main .parquet report.

Best, Vadim

crf2213 commented 1 week ago

Thanks for your reply, and I have another question dealing with the main.parquet report. the quantification information in the report is on precusors level, when it comes to the phosphosites level, may i directly sum up the Precursor.Normalised intensity of all peptides corresponding to one phosphosite? Or are there other Intensity values or different summation algorithms?

vdemichev commented 1 week ago

The 0.9 and 0.99 matrices are generated by taking the maximum of intensity of all precursors with the respective phosphosite localised with 0.9 or, respectively, 0.99 confidence in a given run. I would suggest to use the same strategy, it's more robust with respect to possible identification errors than summing.

youngbee12 commented 1 day ago

Hi,

You can obtain the full phosphosite information from the main report in .parquet format, i.e. confidence levels for individual phosphosites and information on their location within the protein. It just requires an R or Python script to extract it and make a nice table. This is already done automatically at 0.9 and 0.99 site confidence levels, and using one of these is a good idea for most experiments. But if you need more information reported, please just use the main .parquet report.

Best, Vadim Hi, If I set the maximum variable modification to 2, there may be two phosphorylation sites on one peptide. How do you organize the final results into a single site format?

vdemichev commented 1 day ago

Either use 0.9 and 0.99 matrices, or need to write a custom script based on the Site.Occupancy.Probabilities and Protein.Sites columns of the main .parquet report.

youngbee12 commented 1 day ago

image This is an example. The probabilities of two sites of a peptide are both 1, indicating that this peptide may be a diphosphorylated peptide. What strategy did you use when integrating it into sites? Why not organize it into the form of diphosphorylated sites? Or how did you convert this diphosphorylated peptide into information about a single site?

vdemichev commented 1 day ago

Why not organize it into the form of diphosphorylated sites

What do you mean? What DIA-NN outputs tells you the exact configuration of an ion in the mass spec in this case.

youngbee12 commented 21 hours ago

Sorry, I may not have explained it clearly. What I meant is how to determine the quantitative level of a single phosphorylation site in the presence of a double phosphorylated peptide. Is there a specific algorithm?

vdemichev commented 20 hours ago

There are some considerations here https://academic.oup.com/bioinformatics/article/40/7/btae432/7701779

youngbee12 commented 20 hours ago

Thank you very much for your detailed answer!!!!