venkatmi / altanalyze

Automatically exported from code.google.com/p/altanalyze
0 stars 2 forks source link

t-test calculation for unequal sized groups is inaccurate #1

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Applicable to version 1.15 and below. Three issues with the t-test calculation 
were recently discovered:
  1) The equal group size t-test calculation was used for unequal group sizes (http://en.wikipedia.org/wiki/Student's_t-test)
  2) The probability equation for t-test p-values rounded the degree's of freedom rather than rounding down.
  3) Assuming equal variance equation was incorrect.

The t-test calculation is only used for the calculation of splicing-index and 
FIRMA p-values and is only assuming unequal variance. Thus, this issue has a 
minor impact on the t-test p-value of splicing scores when the group sizes are 
unequal. Number three is not really applicable, since equal variance is not 
currently assumed anywhere in AltAnalyze. To test this, load the statistics.py 
module and submit group values for equal sized and unequal sized groups, equal 
and unequal variance (2 and 3) to the ttest() and t_probability() functions.

To correct this, we have replaced the ttest method with the 
statistics.OneWayANOVA() function in version 1.16.

Original issue reported on code.google.com by nsalomo...@gmail.com on 2 Oct 2010 at 5:56

GoogleCodeExporter commented 8 years ago

Original comment by nsalomo...@gmail.com on 2 Oct 2010 at 5:58

GoogleCodeExporter commented 8 years ago

Original comment by nsalomo...@gmail.com on 2 Oct 2010 at 5:59

GoogleCodeExporter commented 8 years ago
This problem is almost always caused due to an issue with how the exon and 
junction files are named. AltAnalyze must match up each exon and junction file 
that correspond to a single sample propperly, or this error will occur. An 
example of a valid pair of files names is "patient1__exon.bed" and 
"patient1__junction.bed". A bad example is "patient1_exon.bed" and 
"patient1_junction.bed", where only one and not two underscores are used. The 
text "exon" and "junction" here are not important (anything can be used 
following the __ and AltAnalyze automatically detects exon versus junction 
files based on the number of columns in the file). Rather, the identical sample 
name preceeding the double underscore must be present. In the next version of 
AltAnalyze (versoin 2.0.8), this problem will be caught before the analysis is 
run. 

Original comment by nsalomo...@gmail.com on 15 Jan 2014 at 6:22