vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
263 stars 53 forks source link

How to understand PG.Q.Value in result file #1176

Open Gambrian opened 4 days ago

Gambrian commented 4 days ago

Hi Dr. Vdemichev,

Recently I used diann report.tsv for downstream analysis by diann-rpackage, after

protein.groups <- diann_maxlfq(df[df$Q.Value <= 0.01 & df$PG.Q.Value <= 0.01,], group.header="Protein.Group", id.header = "Precursor.Id", quantity.header = "Precursor.Normalised") the number of precursors (the basic unit of report.tsv is precursor, did I understand it correctly?) decreased by about 4%, the main reason was PG.Q.Value, I've read the usage of diann, it means "run-specific q-value for proteome, channel-specific". I'm new to DIA proteomics, I don't know what "channel" means? Can I treat one raw file as one channel? Why is "channel-specific" emphasized, why doesn't diann directly remove the precursors with PG.Q.Value<= 0.01. How did diann get this number? Are protein groups with a p<0.01 not credible?

Best

vdemichev commented 2 days ago

Hi,

'channel' here only applies to a situation when you use multiplexing and --channels in DIA-NN, which I guess is not the case?

why doesn't diann directly remove the precursors with PG.Q.Value<= 0.01

Because in many cases one does not need high run-specific protein confidence, and applying a global filter is sufficient.

Are protein groups with a p<0.01 not credible?

Any q-value reflects the proportion of false IDs you get. So if you filter at 1% run-specific, 1% of IDs remaining after filtering are expected to be false.

In general, I highly recommend the paper by Rosenberger et al on the topic of FDRs https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581544/.

Best, Vadim