scunning1975 / mixtape

Data and Program files for Causal Inference: The Mixtape
Other
385 stars 217 forks source link

Variant on tea.R #20

Closed iangow closed 1 year ago

iangow commented 3 years ago

This is just FYI.

I wanted a variant of tea.R where a more causal interpretation could be offered (there is a kind of causation going on here, as discussed below). The code here is a bit more general (and a bit shorter), as I put the guesses as a vector in one column.

library(tidyverse)
library(utils) # For combn function

n_cups <- 8L

# Can choose any n_cups/2 cups to have "milk first"
# The code in the book assumes (wlog) that these are 1, 2, 3, 4.
milk_first <- sample(n_cups, n_cups/2L)
milk_first
#> [1] 3 6 2 8
n_guesses <- length(milk_first)

guesses <-
  tibble(guesses = combn(n_cups, n_guesses, simplify = FALSE)) %>%
  rowwise() %>%
  mutate(correct = setequal(guesses, milk_first)) 

# Look at typical guesses. These are vectors.
guesses$guesses[1]
#> [[1]]
#> [1] 1 2 3 4
guesses$guesses[67]
#> [[1]]
#> [1] 4 5 6 8

sum(guesses$correct)
#> [1] 1
p_value <- mean(guesses$correct)
p_value
#> [1] 0.01428571

guesses %>% filter(correct) %>% select(guesses) %>% pull()
#> [[1]]
#> [1] 2 3 6 8

# Now suppose that protocol is changed. For each cup, Fisher tosses a coin:
# if heads, he puts in milk first; if tails, he puts in tea first.
# Muriel Bristol then sips each of the cups and declares either "milk first" or
# "tea first".
#
# With this protocol, it's easier to see a causal interpretation. Label the
# milk-first cases as treatment; tea-first cases as controls. The question is
# whether treatment leads to the outcome of "milk first". One has a sample of 
# n_cups independent observations and can use the regular framework. But here,
# we just assume that Bristol guessed all correctly and calculate exact p-value
# for that.
#
# (Note that one can apply a similar causal interpretation to the original 
#  protocol, but one couldn't apply the standard framework to it; 
#  no SUTVA for one, as Bristol will declear four and only four cases as 
#  "tea first". 
#
#  Alternatively, if Fisher did not tell Bristol that four
#  cups are milk-first and four are tea-first, then he could use the same 
#  standard framework, as the guesses would become independent again under
#  the sharp null that putting tea or milk first does not affect Bristol's 
#  guess.)
milk_first <- which(sample(c(TRUE, FALSE), n_cups, replace=TRUE))
milk_first
#> [1] 2 7

guess_n <- function(x) combn(n_cups, x, simplify = FALSE)

guesses <-
  tibble(guesses = unlist(lapply(0:n_cups, guess_n), recursive = FALSE)) %>%
  rowwise() %>%
  mutate(correct = setequal(guesses, milk_first))

sum(guesses$correct)
#> [1] 1
p_value <- mean(guesses$correct)
p_value
#> [1] 0.00390625

guesses %>% filter(correct) %>% select(guesses) %>% pull()
#> [[1]]
#> [1] 2 7

Created on 2021-02-27 by the reprex package (v1.0.0)