sbfnk / rbi

R package for Bayesian inference with state-space models using LibBi.
https://sbfnk.github.io/rbi/
24 stars 9 forks source link

Passing a data frame with both multiple samples and dimensions doesn't work #11

Closed BlackEdder closed 7 years ago

BlackEdder commented 7 years ago

If I read the libbi documentation correct it should be possible to pass multiple samples for fitting to libbi by using an ns dimension. On top of that my data also has a second user defined dimension lage, but if I try to pass this to rbi I get the following error:

Error in netcdf_create_from_list(filename, variables_with_dim, ...) : Could not decide on coord dimension between lagens
sbfnk commented 7 years ago

Could you try with the latest development version (installed using install_github)? There have been a few bug fixes around writing/reading files with multiple dimensions, and a new version will be released soon.

If that fails, could you provide a minimal reproducible example?

BlackEdder commented 7 years ago

Afraid that that did not seem to work. Minimal example:

library(dplyr)
library(rbi)
library(rbi.helpers)
model_str <- "
model SI {
  const no_age = 5
  dim age(no_age)

  param mu[age]
  param shape[age]

  obs cnt[age]

  sub observation {
    cnt[age] ~ negbin(mu[age], shape[age])
  }

  sub initial {
  }

  sub parameter {
    mu[age] ~ truncated_gaussian(1.2, 0.1, lower = 0.01)
    shape[age] ~ truncated_gaussian(0.8, 0.1, lower = 0.01)
  }

  sub transition {
  }
}
"

# Generate fake data:
cnt_df <- data.frame(
  age = floor(runif(100,0,5)),
  value = rnbinom(100, mu = 10, size = 10)) %>%
  group_by(age) %>%
  mutate(ns = row_number() - 1) %>%
  ungroup()

obs_lst <- list(cnt = cnt_df)

model <- bi_model(lines = stringi::stri_split_lines(model_str)[[1]])

bi <- rbi::sample(model = model, end_time = 100, obs = obs_lst, time_dim = "time", nsamples = 100, nthreads = 8, nparticles = 24)

Resulting in error:

Error in netcdf_create_from_list(filename, variables_with_dim, ...) : Could not decide on coord dimension between age, ns

EDIT: Cleanup example to remove other errors

sbfnk commented 7 years ago

Thanks. I get the same error.

BlackEdder commented 7 years ago

I hacked around this issue (by treating ns as a special column). But now I get the following error:

Error in ncvar_put(nc, vars[[name]], values[[name]]) : ncvar_put: error: you asked to write 5865 values, but the passed data array only has 2985 entries!

This is most likely because I am using sparse data, with a lot of missing values. I.e. max(ns)*max(age)==5865, but because I have more samples for certain ages, the actual amount of data is 2985. I had a look at the code, but could not readily find where the size of the to be expected data is calculated. Any pointers?

sbfnk commented 7 years ago

As per 4d7b3d879d3e66e851eba94176baa5068ae381f8 using an ns column should wok. However, I don't think libbi can handle sparsely sampled observations in different samples, i.e. it expects the same observations in every sample. You may have to use a separate observation data frame in each sample.

BlackEdder commented 7 years ago

However, I don't think libbi can handle sparsely sampled observations in different samples, i.e. it expects the same observations in every sample

I was kinda afraid of that, but the manual seemed to suggest it should be fine with different samples. I'll give it a test :)

sbfnk commented 7 years ago

Closing as RBi issue is fixed now.