Question about searching sub-cohort versus full cohort of files

vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.

Other

283 stars 53 forks source link

Question about searching sub-cohort versus full cohort of files #1170

Closed dtabang closed 2 months ago

dtabang commented 2 months ago

Hello, I have a question regarding how cross-run normalization for quantification values and FDR are affected by how many files are being searched - do these values change (and others, like numbers of proteins with quantifiable intensities per file) depending on if I am searching different proportions of my full cohort of files? E.g. if I search half of my files together only, 100 files versus the full 200 file cohort? Thanks.

vdemichev commented 2 months ago

Hi,

In terms of FDR, global FDR control ('Global' q-values in DIA-NN main output report without MBR and 'Lib' q-values with MBR) ensures identification confidence regardless of the experiment size.

Normalisation should work fine without regard for the number of runs, however the exact quantities obtain are always influenced to some extend by what is included in the experiment. Here, ideally blanks and failed runs should not be included in the final analysis.

Best, Vadim

dtabang commented 2 months ago

Thanks Vadim for your quick answer! Sounds like searching the sub-cohort versus the full cohort should not matter much as long as failed runs are not included.