jasonwgriffin / metapower

6 stars 0 forks source link

Error in subgroup_power() sample size? #5

Closed mcweenysean closed 3 years ago

mcweenysean commented 3 years ago

Hi,

First, thank you for putting together this package - it will come in super handy. I think I may have caught a bug in the subgroup_power() function relating to sample size. Both in the R version I am using (0.2.1) and on the git page, these lines appear:

 if(es_type == "d"){
    sample_size <- sample_size/2

This is true if and only if you are using 2 groups. When I enter a sample size of 66 with three groups, the expected sample size output shows as 33 per group. I don't know the math super well or what compute_variance() does exactly, but downstream after dividing by 2, you divide sample_size again by n_groups when calculating variance. Assuming the first argument in compute_variance() is sample size, this would greatly increase variance since you have divided the sample size twice now.

Please let me know if anything I said is incorrect or you have any questions. Thank you!

jasonwgriffin commented 3 years ago

Thank you for your input and feedback! However, this particular line of code is specific to Cohen's d not the subgroups. Currently, sample_size reflects the total number of participants in a study, which for Cohen's d - the standardized mean difference between 2 groups - assumes an equal number of participants in each group (i.e., sample_size/2). IN contrary, for correlations, the sample size is not divided by 2 at the outset. This ensures that for Cohen's d, the variance is computed appropriately given the 2 groups' sample size (see below). For other metrics, like correlation, the sample size is not divided by 2 because of the within-subject nature of that metric; this also ensures that the computation of the variance for correlation is correct within the compute_variance() function (see below)

compute_variance <- function(sample_size, effect_size, es_type, con_table){

  if(es_type == "d"){

    return(round(((sample_size+sample_size)/((sample_size)*(sample_size))) + ((effect_size^2)/(2*(sample_size+sample_size))),5))

    } else if (es_type == "r"){

      return(1/(sample_size-3))

      }else if (es_type == "or"){
        con_table <- data.frame(a = con_table[1],
                                b = con_table[2],
                                c = con_table[3],
                                d = con_table[4])
        return((1/con_table$a)+(1/con_table$b)+(1/con_table$c)+(1/con_table$d))

  }
}

After replicating your parameters, I do notice that the summary output is printing the sample size per group overall (i.e., 33, which is sample_size/2); however, the compute_variance() functions takes sample_size/n_groups as an input to compute the variance for each subgroup assuming they are the same size (i.e., 11 when n_groups = 3). In conclusion, I believe that everything is being calculated correctly, but the output is confusing, I am going to change the summary output of subgroup_power() to clearly show the sample size of group 1 and 2, as well as the sub sample sizes within group 1 and group 2.

Please let me know if you disagree; I of course want this package to be as accurate and correct as possible.

mcweenysean commented 3 years ago

Thanks for your response! That makes total sense. My real parameters are average n = 22, and I'm using three age groups as the moderator. In this case, I cannot enter sample_size = 22 and n_groups = 3 due to the error message (sample size must be a multiple of n_groups). I will likely just change to n = 21 in the meantime.

age_mod <- subgroup_power(n_groups = 3, effect_sizes = c(.3,.6,.9), sample_size = 21, k = 20, es_type = "d")

ps looks like you're killing it on the publication front - congrats!

On Mon, Jan 18, 2021 at 8:24 AM Jason W Griffin notifications@github.com wrote:

Thank you for your input and feedback! However, this particular line of code is specific to Cohen's d not the subgroups. Currently, sample_size reflects the total number of participants in a study, which for Cohen's d - the standardized mean difference between 2 groups - assumes an equal number of participants in each group (i.e., sample_size/2). IN contrary, for correlations, the sample size is not divided by 2 at the outset. This ensures that for Cohen's d, the variance is computed appropriately given the 2 groups' sample size (see below). For other metrics, like correlation, the sample size is not divided by 2 because of the within-subject nature of that metric; this also ensures that the computation of the variance for correlation is correct within the compute_variance() function (see below)

compute_variance <- function(sample_size, effect_size, es_type, con_table){

if(es_type == "d"){

return(round(((sample_size+sample_size)/((sample_size)*(sample_size))) + ((effect_size^2)/(2*(sample_size+sample_size))),5))

} else if (es_type == "r"){

  return(1/(sample_size-3))

  }else if (es_type == "or"){
    con_table <- data.frame(a = con_table[1],
                            b = con_table[2],
                            c = con_table[3],
                            d = con_table[4])
    return((1/con_table$a)+(1/con_table$b)+(1/con_table$c)+(1/con_table$d))

} }

After replicating your parameters, I do notice that the summary output is printing the sample size per group overall (i.e., 33, which is sample_size/2); however, the compute_variance() functions takes sample_size/n_groups as an input to compute the variance for each subgroup assuming they are the same size (i.e., 11 when n_groups = 3). In conclusion, I believe that everything is being calculated correctly, but the output is confusing, I am going to change the summary output of subgroup_power() to clearly show the sample size of group 1 and 2, as well as the sub sample sizes within group 1 and group 2.

Please let me know if you disagree; I of course want this package to be as accurate and correct as possible.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jasonwgriffin/metapower/issues/5#issuecomment-762282328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHF4TL33XT75773PUKLZUHTS2RACDANCNFSM4WEL7DTA .

-- Sean McWeeny Doctoral Student, Communication Sciences & Disorders Northwestern University