vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
279 stars 54 forks source link

How to get the pr.matrix.tsv from the main out by the user? #1116

Open xiaoxHuang opened 3 months ago

xiaoxHuang commented 3 months ago

Hi Vadim

Thanks for your work!

I used diann-1.8.1 in linux. I want to reproduce the pr.matrix.tsv from the main out. As you mentioned here: 屏幕截图 2024-08-03 220746 using global q-values for protein groups and both global and run-specific q-values for precursors

But there are Q.value, PG.Q.value, Global.Q.value, Global.PG.Q.value, Lib.Q.value, Lib.PG.Q.value in the main out. I want to know which one or some values are used as filters to get the pr.matrix? In the cmd line, I tried both --qvalue 0.01 --matrix-qvalue 0.01 and --qvalue 0.03 --matrix-qvalue 0.03. Then got the pr.matrix by Python to compare the results with those saved by DIANN. I can not reproduce the results sometimes.

You also mentioned that All the 'matrices' can be reproduced from the main .parquet report, if generated with precursor FDR set to 5%, using R or Python.

So, I wonder if the 'qvalue 0.05' is a must for user to reproduce by R or Python?

Thanks!

Best wishes

vdemichev commented 3 months ago

Hi,

The docs are now for 1.9.1, i.e. don't match the output of 1.8.1.

I want to know which one or some values are used as filters to get the pr.matrix?

df <- df[df$Q.Value <= 0.01 & df$Lib.Q.Value <= 0.01,] for 1.8.1 with MBR, without MBR replace Lib with Global.

I can not reproduce the results sometimes.

This is the most popular question here :) If you wish, I could take a look at the data (I need full logs & to know what is the file name of the matrix you are looking at), but there's no practical reason why you'd want to reproduce the matrix though. If you work in R or Python, the advice is to never use matrices.

Best, Vadim

xiaoxHuang commented 3 months ago

Hi,

Thanks for your reply. If you work in R or Python, the advice is to never use matrices. The reason that I want to reproduce the pr.matrix.tsv is to make sure that I can get the reliable results as diann does (because diann outputs the pr.matrix.tsv). You recommend not to use matrics, does it mean that the main out without filtering can be used as the final result?

Thanks!

Best regards

vdemichev commented 3 months ago

You recommend not to use matrics, does it mean that the main out without filtering can be used as the final result?

Please do filter, see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581544/ for basics on how to filter data. Please also see the "How to choose the FDR/q-value threshold?" section of https://github.com/vdemichev/DiaNN?tab=readme-ov-file#frequently-asked-questions and https://github.com/vdemichev/DiaNN?tab=readme-ov-file#match-between-runs.

I guess the DIA-NN docs are missing a dedicated section on output filtering. I will add.

Best, Vadim