ropensci / tidyqpcr

quantitative PCR analysis in the tidyverse
https://docs.ropensci.org/tidyqpcr/
Other
50 stars 18 forks source link

Implement geNorm reference gene selection method #17

Open ewallace opened 4 years ago

ewallace commented 4 years ago

Check the MIQE-recommended GeNorm method, and compare in detail to tidyqpcr's normalizeqPCR function, which is geometric (Ct/log-scale) and uses median.

Method is described in Vandesompele et al., Genome Biology, 2002, 'Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes'.

According to the geNorm website, this method is implemented in commercial software qbase+.

Fixing this will likely require adding an "efficiency" argument to normalizeqPCR.

ewallace commented 4 years ago

See also relative quantification framework paper, including taking a mean of efficiencies for normalizing acorss multiple reference genes.

That 2007 paper refers to qbase as an open-source software, but I could not find any evidence that the source was still available.

ewallace commented 4 years ago

Clarification: as it says on the genorm website:

geNorm is a popular algorithm to determine the most stable reference (housekeeping) genes from a set of tested candidate reference genes in a given sample panel. From this, a gene expression normalization factor can be calculated for each sample based on the geometric mean of a user-defined number of reference genes.

In tidyqpcr, normalizeqPCR already does the expression normalization factor for each sample/ SampleID based on geometric median or mean of user-defined reference genes.

Implementation of geNorm would be a separate function to select reference genes from a dataset, as described in this paper:

Vandesompele, J., De Preter, K., Pattyn, F. et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol 3, research0034.1 (2002)..

For every control gene we determined the pairwise variation with all other control genes as the standard deviation of the logarithmically transformed expression ratios, and defined the internal control gene-stability measure M as the average pairwise variation of a particular gene with all other control genes. Genes with the lowest M values have the most stable expression. Assuming that the control genes are not co-regulated, stepwise exclusion of the gene with the highest M value results in a combination of two constitutively expressed housekeeping genes that have the most stable expression in the tested samples.

Materials and methods says:

For every combination of two internal control genes j and k, an array A jk of m elements is calculated which consist of log2-transformed expression ratios a_ij /a_ik (Equation 2). We define the pairwise variation V jk for the control genes j and k as the standard deviation of the A jk elements (Equation 3). The gene-stability measure M_j for control gene j is the arithmetic mean of all pairwise variations V_jk (Equation 4).

Then:

Taking all this into consideration, we recommend the minimal use of the three most stable internal control genes for calculation of an RT-PCR normalization factor (NFn , n = 3), and stepwise inclusion of more control genes until the (n + 1)th gene has no significant contribution to the newly calculated normalization factor (NF{n + 1}).

Implementing that in tidyqpcr would involve:

This is feasible, and we should ask users if the feature is a priority.