tests under facet and strata

cicdguy commented 3 years ago

closes #1238 closes #1256

NEST/teal.modules.general/issues/1250 Additionally: - add bins argument to tm_g_ - add additional input for "scales" when facetting is applied - enable the NULL args for strata and facet - add ANOVA test - test code simplification Example: user/2704/files/5c26c380-f164-11eb-9b52-82e930b42e0d) Provenance: ``` Creator: Polkas ```

cicdguy commented 3 years ago

I believe that scales = fixed should be a default
please add horizontal scrollbar
user/1470/files/c32cbe00-f398-11eb-9fb8-b56d7da46ac6)
I also think we need to round the numbers wisely (see above printscreen - it's not looking well)

That's an interesting approach of presenting output of tests. It has a nice features such as easy comparison, table ordering by strata/facet, ease of copying the result but on the other hand I also see a big disadvantage of not being able to handle multiple tests if that's selected by user (we should have a ticket for that). In other words: how would you combine multiple tables row-wise? What's your view on that? Initially I was expecting something like below:

FACET: VAR_Y = VAL1
STRATA: VAR_X = VAL2
> t.test(...)

    Welch Two Sample t-test

data:  1:10 and c(7:20)
t = -5.4349, df = 21.982, p-value = 1.855e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.052802  -4.947198
sample estimates:
mean of x mean of y 
      5.5      13.5 

------------------------
FACET: VAR_Y = VAL1
STRATA: VAR_X = VAL3
> t.test(...)

    Welch Two Sample t-test

data:  1:10 and c(7:20)
t = -5.4349, df = 21.982, p-value = 1.855e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.052802  -4.947198
sample estimates:
mean of x mean of y 
      5.5      13.5 

------------------------
...

From technical perspective, the big advantage of the above is that it would be very easy to add a new (even custom) tests. We can even open it as an argument at some point. I think we all love more "open" software. From stats perspective, it's also good that you are presenting all of the output. It might happen that you could get something that won't be easy to be formatted as a vector (and then append into table).

Provenance:

Creator: pawelru

cicdguy commented 3 years ago

I believe that scales = fixed should be a default

please add horizontal scrollbar user/1470/files/c32cbe00-f398-11eb-9fb8-b56d7da46ac6)

I also think we need to round the numbers wisely (see above printscreen - it's not looking well)

That's an interesting approach of presenting output of tests. It has a nice features such as easy comparison, table ordering by strata/facet, ease of copying the result but on the other hand I also see a big disadvantage of not being able to handle multiple tests if that's selected by user (we should have a ticket for that). In other words: how would you combine multiple tables row-wise? What's your view on that? Initially I was expecting something like below:
FACET: VAR_Y = VAL1
STRATA: VAR_X = VAL2
> t.test(...)

  Welch Two Sample t-test

data:  1:10 and c(7:20)
t = -5.4349, df = 21.982, p-value = 1.855e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.052802  -4.947198
sample estimates:
mean of x mean of y 
      5.5      13.5 

------------------------
FACET: VAR_Y = VAL1
STRATA: VAR_X = VAL3
> t.test(...)

  Welch Two Sample t-test

data:  1:10 and c(7:20)
t = -5.4349, df = 21.982, p-value = 1.855e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.052802  -4.947198
sample estimates:
mean of x mean of y 
      5.5      13.5 

------------------------
...
From technical perspective, the big advantage of the above is that it would be very easy to add a new (even custom) tests. We can even open it as an argument at some point. I think we all love more "open" software. From stats perspective, it's also good that you are presenting all of the output. It might happen that you could get something that won't be easy to be formatted as a vector (and then append into table).

In my opinion supporting multiply tests so e.g. t.test and var.test at the same time is too much.

Output propose by you will have e.g. 14*10 (COUNTRY) lines = 140 lines.

This is how e.g. broom package formatting the models output. broom might be used like:

> airquality %>% group_by(Month) %>% do(st = shapiro.test(.$Ozone)) %>% broom::glance(st)
# A tibble: 5 x 4
# Groups:   Month [5]
  Month statistic    p.value method                     
  <int>     <dbl>      <dbl> <chr>                      
1     5     0.714 0.00000829 Shapiro-Wilk normality test
2     6     0.843 0.0628     Shapiro-Wilk normality test
3     7     0.980 0.867      Shapiro-Wilk normality test
4     8     0.933 0.0903     Shapiro-Wilk normality test
5     9     0.784 0.0000433  Shapiro-Wilk normality test

I will rewrite it to use broom package

Provenance:

Creator: Polkas

cicdguy commented 3 years ago

@pawelru so I applied the broom::glance, now this is a more general solution. The only thing needed is horizontal scroll which you pointed. As we have more attributes for some tests now, or we could dplyr::select certain columns.

I applied mutate_if with round and set the Fixed scale as a default.

Provenance:

Creator: Polkas

cicdguy commented 3 years ago

Blocked for 2h after long call with @wwojciech i will add new updates

Provenance:

Creator: Polkas

insightsengineering / teal.modules.general

tests under facet and strata #5