vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
279 stars 54 forks source link

Deriving Phospho Site Tables from the output report #1174

Open misak-acrivon opened 1 month ago

misak-acrivon commented 1 month ago

Hi Vadim,

I am a little curious in knowing how to derive the phospho site tables from the main DIA-NN report? Which columns and respective filters would I need to apply to derive the table. I realize that I would need to reshape the report from long to wide format in the end after applying these filters as well.

Thank you very much in advance!

Marc

vdemichev commented 1 month ago

Hi Marc,

You can use the 0_9 and 0_99 phosphosite matrices, i.e. those summarise things in a convenient format.

Best, Vadim

misak-acrivon commented 1 month ago

Unfortunately, I would like to derive these tables for localisation scores >= 0.75. So I guess I need to start from the parquet DIA-NN report?

vdemichev commented 1 month ago

I would like to derive these tables for localisation scores >= 0.75

The scores for the tables produced by DIA-NN are 0.9 and 0.99 respectively

So I guess I need to start from the parquet DIA-NN report?

You can if you would like to do some fancy filtering. Can you please elaborate what is not clear about the .parquet report contents? That is there's a column identifying the sites & column with their localisation scores + Peptidoform.Q.Value and/or Lib.Peptidoform.Q.Value are also good to use for filtering. That's basically it, nothing sophisticated really with those tables DIA-NN produces :)

vdemichev commented 1 month ago

These columns: image They are only included in the report if phospho is declared as --var-mod and peptidoform scoring is enabled (the GUI does this by default if ticking the phospho option).

misak-acrivon commented 1 month ago

Thanks Vadim,

the problem is that even if I filter based on these columns, I cannot recreate the phospho site tables given by DIA-NN at localization scores of 0.9 and 0.99 respectively. It means I am doing something wrong here, or that I am not accounting for some information when filtering.

If not too inconvenient, could you perhaps take a look at my small R-script (attached as .txt since .R is not a supported file type) that runs the filtering and spot where things might go wrong? I suppose it happens under the section 'Extract phospho sites passing filtering criteria...'

make_diann_psite_table.txt

Thank you in advance!

Marc

vdemichev commented 1 month ago

I cannot recreate the phospho site tables given by DIA-NN

The exact filters are:

But I suggest not to try to reproduce those matrices. The question how to best quantify phosphosites (with what filtering) is an open question, there's no definitive answer in literature. What you come up with by try different filters on a specific experiment can as well end up better than what DIA-NN does by default when generating those matrices.

Best, Vadim

misak-acrivon commented 1 month ago

Thanks Vadim!

I am definitely getting closer after you specified the new filters, but the matrices are not the same. I agree with you that tweaking of filters perhaps could generate better reports than what DIA-NN does for specific experiments. However, it would be nice to have baseline settings established that agree with the DIA-NN output as a sanity check before tweaking of settings.

I know that I am being annoying right now, but would you have the possibility to look at my updated R-script? Just to quickly see that I am not doing anything wrong now when the new filters have been added.

make_diann_psite_table.txt

vdemichev commented 1 month ago

No worries :)

siteConfidence = diannReport$PTM.Site.Confidence >= 0.9

DIA-NN looks at confidence of individual sites in the next column, whereas PTM.Site.Confidence is the 'worst site confidence'. So like this you will get less hits.

misak-acrivon commented 4 weeks ago

Hi Vadim,

I have now tried using the R-script that I shared with you earlier on another dataset, but I am getting very different results as compared to the phosphosites tables that are output automatically with a DIA-NN run. So there must still be something that I am doing vastly different from DIA-NN when creating this table. I understand that it could take you a lot of time to dig into what I might do wrong in the shared script.

It would be great if the 'diann-rpackage' that you developed some time ago could have a function to derive any p-site table from the parquet report similar to the ones automatically created by DIA-NN. Would this be an interesting feature to add?

Thanks for all the help so far.

vdemichev commented 4 weeks ago

Hi Marc,

We in general plan to overhaul the R package, but this will not happen within the next two months. About vastly different results, this means either the DIA-NN code not corresponding to the specification here https://github.com/vdemichev/DiaNN/issues/1174#issuecomment-2360303400 or the script you use. So in case there's a discrepancy (any whatsoever), either one or the other is not conforming to the specification. This can easily be checked manually (just for a single peptide ID in a single run) - if it's DIA-NN that produces a different result, I can take a look why exactly this happens.

Best, Vadim