sa-lee / starmie

starmie: plotting and inference for population structure models :star2:
Other
12 stars 6 forks source link

simulate Q matrix with repeated runs from dirichlet distribution #21

Open sa-lee opened 8 years ago

sa-lee commented 8 years ago

see http://www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/dirichlet.pdf and https://en.wikipedia.org/wiki/Dirichlet_distribution

k <- 20
r <- 20
n_samples <- 100
# generate Q matrix by sampling from diricihlet
# keep alpha hyper param constant (same mean and variance for each K)
Q_1 <- MCMCpack::rdirichlet(n_samples, rep(1, k))
# permute labels by shuffling and adding gaussian noise noise over number of runs
permute_and_add_noise <- function(Q_mat, r) {
  logit <- function(x) { log(x / (1-x)) }
  perturb <- function(Q_mat) { 
    # add gaussian noise to each row of matrix
    q_random <- apply(Q_mat, 1, 
                      function(y) 1 / (1 + exp(-rnorm(length(y), 
                                                      mean = logit(y), 
                                                      sd = 0.01))))
    # normalise rows 
    q_normalised <- q_random / rowSums(q_random)
    # shuffle labels 
    q_normalised[, sample(seq_len(ncol(q_normalised)))]
  }

  replicate(r - 1, perturb(Q_mat))
}

permute_and_add_noise(Q_1, r)

Will have to fix labelling at some point but it is a start.

sa-lee commented 8 years ago

@gtonkinhill i think this might be reasonable for testing different label switching algs. should look into Stephens, 2000 as well.

sa-lee commented 8 years ago

As discussed the other day - these are the underlying problems here:

  1. Introducing noise to the Q-estimates (which the code in previous commit does do.)
  2. Introducing noise in the label switches
  3. Changing Dirichlet params to introduce multiple modes.

The suggestion by Melanie to solve 2. is to pass a confusion matrix as input for a sample being mislabelled. i.e. say we have a Q-matrix with correct labels then in each replicate we add noise to the Q and swap labels with some probability given by the confusion matrix.

Alternatively The label.switching package has a function called permute.mcmc that I think does what we want.