vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
266 stars 53 forks source link

how to calculate pr_matrix.tsv #1072

Closed hxxhust163 closed 2 months ago

hxxhust163 commented 3 months ago

Hi

Thanks for your work in DiaNN. I used Diann1.8.1 in linux system and have a few questions about '--qvalue' and '--matrix-qvalue' . With the same DIA data, spectral library and other params, I tested the combination of different 'qvalue' and 'matrix-qvalue'.

Following is the params and results. 20.9M is the size of the main out, while 529.1k is the size of pr_matrix.tsv.

Screenshot from 2024-07-03 17-00-28

My questions are:

Q1. when qvalue=0.01 and matrix-qvalue=0.01, by filtering 'Lib.PG.Q.Value'<=0.01 in main out and convert it into wide format, I can got a tsv containing 3891 precursors, which is the same as the pr_matrix.tsv reported by DiaNN. So, I think Diann also use this way to convert the main out into pr_matrix. Is it right?

Q2. when qvalue=1 and matrix-qvalue=0.01, by filtering 'Lib.PG.Q.Value'<=0.01 in main out and convert it into wide format, I can got a tsv containing 7559 precursors, which is larger then 5352 reported in pr_matrix.tsv. So, I wonder how is the 'pr_matrix' calculated? This situation also applies to qvalue=1, matrix-qvalue=1 and qvalue=1, matrix-qvalue=0.05, and you can see that the precursors are both 7218, which seems strange.

Q3. For qvalue=1, matrix-qvalue=0.01, let the main out called 'M1' and called the main out 'M2' in qvalue=0.01, matrix-qvalue=0.01. How can I got 'M2' from 'M1', as the qvalue=1 makes 'M1' a complete result. I tried by filtering 'Q.Value'<=0.01 in 'M1' and got a tsv containing 12948 rows, which is larger than 10075 in 'M2'. So, how to explain this?

I want to get a complete main out and calculate different FDR by my own based on the main out by running DiaNN only once. That is why I do the above comparison.

The results above make me confused. Maybe there is something wrong in my setting? I am writing to ask for help. Thanks very much!

Best wishes Xiaoxiang

vdemichev commented 3 months ago

Hi Xiaoxiang,

Q1. Sounds right. Q2. Matrix generation also involves applying run-specific q-value filter, i.e. Q.Value <= 0.01. Q3. Did you also filter by Lib.Q.Value?

In general, the documentation of DIA-NN describes exactly how the matrices are obtained. However there's no practical purpose in attempting to reproduce the matrices. If you use R or Python, which is recommended, then please use exclusively the main report and never the matrices. If you use MS Excel or similar software, then it's only practical to use the matrices and not the main report.

I want to get a complete main out and calculate different FDR by my own based on the main out by running DiaNN only once. That is why I do the above comparison.

For this please use the main report and apply the recommended filtering, as described in the docs.

Best, Vadim

hxxhust163 commented 3 months ago

Hi Vadim

Thanks for your help! I figure Q1 and Q2 out. For Q3 and others, I will try again.

Best wishes Xiaoxiang