Management of missing values in a paired t_test or pairwise_t_test: effect on sample size and beyond

Hello,

1) when NA values are provided to t_test() or pairwise_t_test(), they are not discarded to estimate the sample size. One example:

# Import data
data("ToothGrowth")
df <- ToothGrowth

# Perform (paired) t-test
df %>% t_test (len ~ supp)
df %>% t_test (len ~ supp, paired = TRUE)

# Replace one observation by NA
df$len[1] <- NA

# Redo (paired) t-test
df %>% t_test (len ~ supp)
df %>% t_test (len ~ supp, paired = TRUE)

One can see that the sample size is not affected. Could that be corrected? I think this point was also raised in the issue #147, but also in the solved issue #104 for wilcox_test(). I guess, though, that the correct sample size is used to compute the t statistic, because the results are the same as with the base R function:

x <- df$len[df$supp == "OJ"]
y <- df$len[df$supp == "VC"]
t.test(x, y)
t.test(x, y, paired = T)

2) For paired tests, if one observation is NA in group 1, does the function:

remove the paired observation in group 2?
or estimate the missing value in group 1 through any imputation method?

3) As far as I understand, the pairing is done depending on the order of each observation in the dataset within each group. It would be great to add an argument for the user to supply the column for pairing observations, just like the wid argument for mixed ANOVA in anova_test().

4) Speaking about mixed ANOVA with anova_test(), same kind of question as in point 2. Let's take the example you developed here:

data("anxiety", package = "datarium")
anxiety <- anxiety %>%
  gather(key = "time", value = "score", t1, t2, t3) %>%
  convert_as_factor(id, time)
res.aov <- anova_test(
  data = anxiety, dv = score, wid = id,
  between = group, within = time)
get_anova_table(res.aov)

And let's replace the first observation at time = t1 by NA:

anxiety$score[1] <- NA
res.aov <- anova_test(
  data = anxiety, dv = score, wid = id,
  between = group, within = time)
get_anova_table(res.aov)

Is there any imputation to estimate the effect of the within-subject factor time? Would you still do a paired pairwise t-test in this case? If yes:

will the first observation at t2 and t3 be removed? (I guess yes given #31)
how would you specify the wid (cf. point 3)?

Thank you for developing this package, this is really appreciated.

Cheers,

Florent

kassambara / rstatix

Management of missing values in a paired t_test or pairwise_t_test: effect on sample size and beyond #175