vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
259 stars 53 forks source link

Clarification on Identification and Quantification Numbers in DIANN #1108

Open hahahahhhhahaha opened 1 month ago

hahahahhhhahaha commented 1 month ago

I hope this message finds you well. I am reaching out with a few questions regarding the identification and quantification of proteins and peptides using DIANN.

I have noticed that the results obtained from the DIANN R package function diann_maxlfq(df[df$Q.Value <= 0.01 & df$PG.Q.Value <= 0.01,], group.header="Protein.Group", id.header = "Precursor.Id", quantity.header = "Precursor.Normalised") differ from those in the pg.matrix. This discrepancy might be due to differences in the Q-values used in the function and those applied in pg.matrix.

Could you please clarify whether I should use the R code for quantification results or rely on the pg.matrix output? Additionally, if I want to count the number of identified proteins or peptides, should I use the main report and remove duplicates directly from it?

I am also seeking a better understanding of concepts like global.q value and lib.q value, and how they impact the data. Any guidance you can provide on these issues would be greatly appreciated.

Thank you for your assistance and for providing such an excellent tool. It has been incredibly helpful for my research.

vdemichev commented 1 month ago

Hi,

About pg_matrix, please see https://github.com/vdemichev/DiaNN?tab=readme-ov-file#output.

In general, it's always recommended to exclusively use the main .parquet report, this requries basic familiarity with data processing in R or Python.

Please see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581544/ for an introduction on different types of FDR control.

Best, Vadim