allow matrix parameters in multivariate distributions

goldingn commented 6 years ago

e.g. in dirichlet(), dirichlet_multinomial(), categorical() etc., the parameter (alpha or prob) can only be a vector of length k, and all n rows of the response are assumed to be identically distributed.

The parameters should be allowed to have n rows, each giving the parameters of the disteibution for the corresponding row of the output.

zamorarr commented 6 years ago

Thanks for this great package. I'm really enjoying it and appreciate all the work put into it. Just wondering if there is an update to this issue? I'd love to test out a multinomial that accepts matrix values for prob and a vector for size. Thanks!

goldingn commented 6 years ago

Sorry, still on the to do list! Should hopefully be implemented on the dev branch within the next month though.

In the meantime, you can always do this the slower way: https://github.com/greta-dev/greta/issues/120#comment-362489577

prodriguezsosa commented 6 years ago

Hi @goldingn,

Thanks for all the hard work on this package, I'm excited to start using it. I was wondering: (a) if you had managed to look at this issue at all or were planning to do so in the near future and (b) get your input on modeling a markov chain using Greta (might be a separate albeit related issue). Below is my attempt using a toy model:

#==============================
# TOY MARKOV CHAIN MODEL
#==============================  
library(greta)
library(dplyr)     
library(magrittr)
library(MCMCpack)

#==============================
# model
#==============================
N <- 5 # number of walkers
M <- 5 # number of steps
S <- 5 # number of states

# initial probabilities drawn from a dirichlet distribution
init <- rdirichlet(1, alpha = rep(1, S))

# transition probabilites drawn from a N(0,1) distribution
transit <- matrix(rnorm(S*S, mean = 0, sd = 1), nrow = 5, ncol = 5) 
transit <- transit %>% exp # exponentiate
transit <- transit/rowSums(transit) # row normalize to transform into transition probabilities

# function to simulate S step walks
simulateWalk <- function(init = init, transit = transit, steps = S){
  walk <- c()
  walk <- c(walk, sample(1:S, 1, prob = init)) # initial step
  for(s in 2:steps){
    walk <- c(walk, sample(1:S, 1, prob = transit[walk[s-1],]) )
  }
  return(walk)
}

#==============================
# simulate data
#==============================
SimData <- data.frame("walker_id" = rep(1:N, each = M), "state" = lapply(1:N, function(x) simulateWalk(init = init, transit = transit, steps = S)) %>% unlist)
SimData <- SimData %>% group_by(walker_id) %>% mutate(pre_state = lag(state, order_by = walker_id)) # lag state variable

#==============================
# adjust data to fit model
#==============================
# dependent variable (current state)
Y <- matrix(0, nrow = nrow(SimData), ncol = S) 
for(i in 1:nrow(Y)){
  Y[i, SimData$state[i]] <- 1
}

# independent variable (previous state)
X_init <- rep(c(1, rep(0,S - 1)), N) # indicates if obs is an initial step
X_pre <- matrix(0, nrow = nrow(SimData), ncol = S)  # selects relevant row of transition matrix (0 if initial state)
for(i in 1:nrow(X_pre)){
  if(!is.na(SimData$pre_state[i])){X_pre[i, SimData$pre_state[i]] <- 1}
}

# relevant probabilities are given by 
X <- X_pre%*%transit*(1-X_init) + rep(init, nrow(Y))*X_init

#==============================
# transformation functions to be used in Greta model
#==============================
# source: https://github.com/greta-dev/greta/issues/118
op <- greta::.internals$nodes$constructors$op
row_cumsum <- function (x) {
  stopifnot(length(dim(x)) == 2)
  op("row_cumsum",
     x,
     tf_operation = tensorflow::tf$cumsum,
     operation_args = list(axis = 1L))
}

# transform sampled transition matrix into transition probabilities
transformBeta <- function(beta){
  prob <- exp(beta) # exponentiate
  prob <- prob/row_cumsum(prob) # row-normalize
  return(prob)
}

#==============================
# Greta model
#==============================

# data
Y <- as_data(Y)
X_init <- as_data(X_init)
X_pre <- as_data(X_pre)

# variables & priors
pi <- dirichlet(alpha = rep(1, S), dim = 1)  # initial probabilities
beta <- normal(0, 1, dim = c(S, S))  # transition matrix

# transform variables
tbeta <- transformBeta(beta)
tpi <- matrix(rep(pi, N*M), nrow = N*M, ncol = S)

# operations
tprobs <- X_pre%*%tbeta*(1-X_init) + tpi*X_init # NEED HELP HERE

# likelihood
for (i in seq_len(nrow(Y))) {
  y <- t(Y[i, ])
  distribution(y) = categorical(t(tprobs[i, ])) 
}

# model
m <- model(pi, beta)

Does this look like a reasonable translation of a markov chain to Greta?

I'm having difficulties in the "operations" part, presumably because I can't simply multiply element-wise when using tensors. I guess I need to define a new function similar to "row_cumsum" but I haven't found much documentation on how to write these. Any leads and suggested documentation on how to manipulate Greta arrays would be greatly appreciated (I have some familiarity with Tensorflow), especially since I plan to estimate different models (e.g. disallow self loops by forcing the diagonal of the transition matrix to equal 0 etc.).

Finally, I was wondering what your progress on this issue -matrix parameters for multivariate distributions- was since using a loop is not ideal for large datasets.

Thank you in advance for your time!

goldingn commented 6 years ago

Heya. Sorry for the delay in replying, and in dealing with this issue. Development has been on hold for a bit because of other commitments, but I'm starting a two-week greta coding sprint today, so things should pick up fast.

So I'll be implementing the matrix-valued multivariate distributions on the dev branch either this week or next, and aiming to release of that branch by the end of the second week.

I'm not sure I understand where the problem is with the Markov chain. The answer to "how to manipulate Greta arrays" is that they should behave like R's arrays. And I'm planning to tidy up the process and documentation of implementing custom operations in this sprint. If that doesn't help, could you please post the Markov chain example in another issue?

prodriguezsosa commented 6 years ago

Thank you @goldingn. I should have been clearer.

Since it's not really an issue with Greta, I posted my question on Stack Overflow.

Look forward to testing the new release!

goldingn commented 6 years ago

I get you now.

The issue is that you are coercing a greta array to an R array, which is having a weird effect - pulling out the array that's just used for printing the unknown values of the greta array. You'll see that by doing:

class(pi)

[1] "greta_array" "array"  
class(tpi)
[1] "matrix"
(a matrix in R-parlance is a 2D array)

I'm surprised by that behaviour (which may be resolved by the fix to #83), so I'll open an issue to clean it up now.

This line of code will do what you want:

tpi <- do.call(rbind, replicate(10, pi))

That replicates pi 10 times, in a list with 10 elements, then combines those row-wise.

That's a little clunky, it would be nicer if you could use greta_array() in the way you used array(). That should be enabled by #99.

By the way, it's totally fine to ask questions about using greta in these issues! At least until I set up a dedicated discussion forum for those sorts of questions.

prodriguezsosa commented 6 years ago

Great, that indeed solves the issue, thank you @goldingn.

goldingn commented 6 years ago

I just pushed an experimental branch that implements matrix-valued multivariate distributions.

You can get it with: devtools::install_github("greta-dev/greta@multivariate_refactor"). It's ahead of the dev branch, so relies on the latest tensorflow (v1.10) and tensorflow probability (v0.3.0).

I'd really appreciate any feedback and bug reports! I'm planning to package this up with the release next week, so ideally feedback before then.

Note that the API has changed to enable this:

row vector parameters

The vector-like parameters of multivariate distributions (e.g. mean in multivariate_normal()) now have to be row vectors (previously they had to be column vectors). That's consistent with the dimension of the output. So you would do something like this:

mu <- t(rnorm(3))
sig <- diag(3)
X <- multivariate_normal(mu, sig)

It would error if you didn't transpose when defining mu.

`dim` -> `n_realisations`, `dimension`

The dim argument has been removed because it was confusing and didn't correspond to the dim for univariate distributions. There are now two new arguments: n_realisations gives the number of separate realisations of the distribution (rows in the output), and dimension gives the dimension of the distribution (columns in the output). dimension is usually implicit and detected from the parameters (except for lkj_distribution), but can be explicitly stated for more explicit and defensive code. So to get 5 realisations of the above distribution, the following are equivalent:

X <- multivariate_normal(mu, sig, n_realisations = 5)

X <- multivariate_normal(mu, sig, n_realisations = 5, dimension = 3)

Obviously, you can now use matrix-valued parameters:

Mu <- matrix(rnorm(15), 5, 3)
X <- multivariate_normal(Mu, sig)
dim(X)

[1] 5 3

Or equivalently:

X <- multivariate_normal(Mu, sig, n_realisations = 5, dimension = 3)

goldingn commented 6 years ago

I just merged it into dev, and deleted that branch. So to try it out you now need to do the usual: devtools::install_github("greta-dev/greta@dev")

greta-dev / greta