obiba / mica2

Mica is a web portal for epidemiological study consortia.
http://www.obiba.org/pages/products/mica/
GNU General Public License v3.0
10 stars 15 forks source link

MK-205: Validate combined variance calculation #1967

Open ymarcon opened 9 years ago

ymarcon commented 9 years ago

Jira issue originally created by user @ymarcon:

Le Grand Mean, c'est la moyenne pondéré des moyennes. GM = Sum[n(i) * M(i)]
(1) Square the standard deviations within each group to get the variances, V(i).
Multiply each variance by  n(i)-1, the so-called “degrees of freedom” for each group, getting what is called the “Error Sum of Squares” within each group, ESSG(i):
  ESSG(i) = V(i) · (n(i)-1)
For example, if group 5 has n(5)=16 observations and standard deviation 3, the variance is 32 = 9, so you would compute 9 · (16-1) = 135.
(2) Add these up over all of your groups, getting an overall Error Sum of Squares:
  ESS = ESSG(1) <ins> ESSG(2) </ins> … <ins> ESSG(G)
But wait - there's more!  Your individual group means are varying around the overall mean GM and we have to take that into account, so....
(3) Compute the deviation  Y(i)-GM  of each group mean from your overall grand mean GM.  Square each one and multiply by its n(i).
For example, if in group 5 you have mean 82 and the overall mean is GM=80, you would compute 16 · 4 = 64, because we had n(5)=16 observations in group 5, and 4 is the square of 2 (i.e. the square of  82-80).  This is the “Group sum of Squares” for group 5.
  GSS(i) = (Y(i)-GM)2 · n(i)
(4) Sum these group sums of squares over all G of your groups getting the total (overall) group sum of squares:
  TGSS = GSS(1) </ins> GSS(2) <ins> … </ins> GSS(G)
(5) Add the overall Error Sum of Squares ESS from step (2) to the overall Group Sum of Squares TGSS from step (4) to get the “Total Sum of Squares.”  Now divide this by the “degrees of freedom” N-1 where you recall that N is the total number of observations you have.  This is the grand variance you seek:
  GV = (ESS + TGSS) / (N-1)
Take the square root of that, to get the standard deviation you seek:
  and the composite standard deviation = √GV.

See also http://www.emathzone.com/tutorials/basic-statistics/combined-variance.html

ymarcon commented 9 years ago

Comment created by @obiba-ci:

SUCCESS: Integrated in !https://ci.obiba.org/images/16x16/blue.png! Mica2 #856 MK-205 Refactored combined statistics code (extracted from REST resource) (yannick.marcon: rev 70924191f18441722f2cfbd9b0c0ce9f28e684f1)

ymarcon commented 9 years ago

Comment created by @obiba-ci:

SUCCESS: Integrated in !https://ci.obiba.org/images/16x16/blue.png! Mica2 #872 MK-205 Combined statistics test added (yannick.marcon: rev 34de1f57e0f13f4ddfb4aa9a23af025b889525a6)

ymarcon commented 9 years ago

Comment created by @obiba-ci:

SUCCESS: Integrated in !https://ci.obiba.org/images/16x16/blue.png! Mica2 #873 MK-205 R code for Combined statistics test added (yannick.marcon: rev 33a3aea6bb5eddff6473d1acf9e0b60d843c3ba6)