maximtrp / scikit-posthocs

Multiple Pairwise Comparisons (Post Hoc) Tests in Python
https://scikit-posthocs.rtfd.io
MIT License
351 stars 40 forks source link

Dunnetts test #24

Closed comatrion closed 3 months ago

comatrion commented 5 years ago

Thanks for this package! One useful addition (unless I am missing it) would be Dunnett's test comparison to a control.

maximtrp commented 5 years ago

@comatrion Thank you for suggestion! I will implement it soon.

mysticaltech commented 4 years ago

Implementing Dunnett would be fantastic, please don't hesitate to ping me, I have Matlab and FORTRAN code(that I also converted to C++) implementing it, I also have access to all the major books and papers on the subjects like the one from HSU. Needless to say, there are many R packages implementing it too but those are GPL licensed. I am willing to share all my resources to help speed up implementation, please let me know at supelex@gmail.com.

maximtrp commented 4 years ago

@mysticaltech This test relies upon multivariate Student's t-distribution function (pmvt in R). AFAIR, I could not find a usable implementation of it in Python a year ago. It would be great, if you helped me with it. I hope there is one somewhere on the web :smile:

mysticaltech commented 4 years ago

Maxim, I have detailed papers explaining it, and clear FORTRAN code (from the famous book "Multiple Comparison" by Jason Hsu, with a detailed chapter about it) and Matlab code implementing it, I also was able to transpile the FORTRAN code to C++. Quite readable! Please send a quick hello to the email above and I will reply with my huge folder on the subject zipped. The material I have will probably be of use to you even for other tests. Screen Shot 2020-06-04 at 10 46 23 PM

Tsuchihashi-ryo commented 4 years ago

Thank you for your greatful jobs! I am one of those people hope Dunnett's test comparison. if it helps, please check probmc function in SAS.

Probmc fuction is returning a probability or a quantile from various distributions for multiple comparisons of means.

maximtrp commented 3 years ago

I have looked at the code in pmvt R package and even tried to implement bindings to Fortran code, but failed. This is not trivial. No progress so far, unfortunately. Maybe SAS manual pages will be helpful...

Ronny-PDT commented 2 years ago

Hi Max Great package idea What is the current status for adding Dunnett's post hoc test? I am willing to help you building it, as I need this solution to my application

maximtrp commented 2 years ago

@Ronny-PDT Thank you for your interest! Unfortunately, I do not have enough time to work on it now. You can try to implement it yourself and do a pull request

Tsuchihashi-ryo commented 2 years ago

I was able to make the Dunnett's test based on the SAS description above. However, I used the pingouin library (https://pingouin-stats.org/generated/pingouin.anova.html#pingouin.anova) to calculate ANOVA. Additionally, the run time is so long (25 seconds for three groups, n=6). Furthermore, I am a biological wet researcher, so I am not skilled in writing code. So please improve it and use it.

import numpy as np
from scipy.stats import norm
from scipy import integrate
from scipy.special import gamma
import pingouin as pg

def dunnett_two_side_unbalanced(data, dv, between,control):

  def dunnett_prob(T,v,lambda_params):

    def pdf(x):
      return norm.pdf(x)

    def cdf(x):
      return norm.cdf(x)

    def f(x,y):
      return pdf(y)*np.prod([cdf((lambda_i*y+T*x)/(np.sqrt(1-lambda_i**2)))-cdf((lambda_i*y-T*x)/(np.sqrt(1-lambda_i**2))) for lambda_i in lambda_params])

    def duv(x):
      return (v**(v/2)*x**(v-1)*np.exp(-v*x**2/2))/(gamma(v/2)*2**(v/2-1))

    def f_g(x,y):
      return f(x,y) * duv(x)

    return integrate.dblquad(f_g,-np.inf,np.inf,lambda x:0,lambda x:np.inf)[0]

  #First compute the ANOVA
  aov = pg.anova(dv=dv, data=data, between=between, detailed=True)
  v = aov.at[1, 'DF'] #自由度
  ng = aov.at[0, 'DF'] + 1 #全群数
  grp = data.groupby(between)[dv]
  n_sub = grp.count()
  control_index = n_sub.index.get_loc(control) #対照群のindexを取得
  n = n_sub.values #サンプル数
  gmeans = grp.mean().values#各平均
  gvar = aov.at[1, 'MS'] / n #各分散

  vs_g = np.delete(np.arange(ng),control_index) #対照群以外のindexを取得

  # Pairwise combinations(検定統計量Tを求める)
  mn = np.abs(gmeans[control_index]- gmeans[vs_g])
  se = np.sqrt(gvar[control_index] + gvar[vs_g]) #式は少し違うが等分散を仮定しているため
  tval = mn / se

  #lambdaを求める
  lambda_params = np.sqrt(n[vs_g]/(n[control_index]+n[vs_g]))

  pval = [1-dunnett_prob(t,v,lambda_params) for t in tval]

  stats = pd.DataFrame({
                      'A': [control]*len(vs_g),
                      'B': np.unique(data[between])[vs_g],
                      'mean(A)': np.round(gmeans[control_index], 3)*len(vs_g),
                      'mean(B)': np.round(gmeans[vs_g], 3),
                      'diff': np.round(mn, 3),
                      'se': np.round(se, 3),
                      'T': np.round(tval, 3),
                      'p-dunnett': pval
                      })
  return stats

I am Japanese, so forgive me for including Japanese annotations. And, I have prepared the article in Japanese, so please translate and scrutinize it if you want! (https://qiita.com/mosomoso_1910/items/1d28440d91f63829b6f9)

maximtrp commented 2 years ago

@Tsuchihashi-ryo Thank you very much!

mysticaltech commented 2 years ago

@Tsuchihashi-ryo Wonderful!