stan-dev / rstan

RStan, the R interface to Stan
https://mc-stan.org
1.04k stars 268 forks source link

Feature Request: Function to strip stanfit of extra parameters #426

Open aaronjg opened 7 years ago

aaronjg commented 7 years ago

Summary:

I often find myself capturing the entire output of all parameters, transformed parameters, and generated data when running a stan model. However, this object is very large, too large in fact for things like shinystan. It would be nice to have a built in way strip the stanfit object down to the essential variables.

Description:

I wrote the following function, which could be potentially integrated into the RStan package. I would do a pull request, but I'm not too familiar with S4 classes so don't know how to integrate it.

cleanObject <- function(object,pars){
  pars <- c(pars,'lp__')
  nn <- paste0('^',pars,'(\\[|$)',collapse="|")
  ids <-  grep(nn,  object@sim$fnames_oi)
  ids.2 <- which(names(object@par_dims) %in% pars)
  for(i in 1:4){
    a <- attributes(object@sim$samples[[i]])
    x <- object@sim$samples[[i]][ids]
    for(j in c('names','inits','mean_pars'))
      a[[j]] <- a[[j]][ids]
    attributes(x) <- a
    object@sim$samples[[i]] <- x
  }
  object@par_dims <- object@par_dims[ids.2]
  object@sim$dims_oi <-   object@sim$dims_oi[ids.2]  
  object@sim$pars_oi<- object@sim$pars_oi[ids.2]
  object@sim$fnames_oi <-  object@sim$fnames_oi[ids]
  object@sim$n_flatnames <- length(object@sim$fnames_oi)
  object
}

RStan Version:

packageVersion("rstan") [1] ‘2.15.1’

R Version:

bgoodri commented 7 years ago

I don't think we want to encourage mucking around with stanfit objects. Why aren't you specifying the pars argument when you call stan or sampling?

sakrejda commented 7 years ago

The 'pars' argument doesn't do it because you want to examine all the arguments, they just don't for in memory. I'm not going to run a model three times to look at different sets of parameters, that would be insane.

On Sat, Jul 1, 2017, 5:16 AM bgoodri notifications@github.com wrote:

I don't think we want to encourage mucking around with stanfit objects. Why aren't you specifying the pars argument when you call stan or sampling ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/426#issuecomment-312419230, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfA6YGg0rs5WZtx9x8_09nd_BG3D6iGks5sJgPygaJpZM4OKElN .

bgoodri commented 7 years ago

Then why aren't you using the pars argument to as.shinystan()?

On Sat, Jul 1, 2017 at 6:39 AM, Krzysztof Sakrejda <notifications@github.com

wrote:

The 'pars' argument doesn't do it because you want to examine all the arguments, they just don't for in memory. I'm not going to run a model three times to look at different sets of parameters, that would be insane.

On Sat, Jul 1, 2017, 5:16 AM bgoodri notifications@github.com wrote:

I don't think we want to encourage mucking around with stanfit objects. Why aren't you specifying the pars argument when you call stan or sampling ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/426#issuecomment-312419230, or mute the thread https://github.com/notifications/unsubscribe- auth/AAfA6YGg0rs5WZtx9x8_09nd_BG3D6iGks5sJgPygaJpZM4OKElN .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/426#issuecomment-312424679, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqok5SWYw72aKN4_CRbM9SGvwob1Rks5sJiH4gaJpZM4OKElN .

sakrejda commented 7 years ago

There's a good answer. I approve. Though sometimes you just don't have enough RAM to do that manipulation in R.

On Sat, Jul 1, 2017, 9:18 AM bgoodri notifications@github.com wrote:

Then why aren't you using the pars argument to as.shinystan()?

On Sat, Jul 1, 2017 at 6:39 AM, Krzysztof Sakrejda < notifications@github.com

wrote:

The 'pars' argument doesn't do it because you want to examine all the arguments, they just don't for in memory. I'm not going to run a model three times to look at different sets of parameters, that would be insane.

On Sat, Jul 1, 2017, 5:16 AM bgoodri notifications@github.com wrote:

I don't think we want to encourage mucking around with stanfit objects. Why aren't you specifying the pars argument when you call stan or sampling ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/426#issuecomment-312419230, or mute the thread https://github.com/notifications/unsubscribe- auth/AAfA6YGg0rs5WZtx9x8_09nd_BG3D6iGks5sJgPygaJpZM4OKElN .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/426#issuecomment-312424679, or mute the thread < https://github.com/notifications/unsubscribe-auth/ADOrqok5SWYw72aKN4_CRbM9SGvwob1Rks5sJiH4gaJpZM4OKElN

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/426#issuecomment-312431770, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfA6fHGv7gFBkLEze9yQuYnD0XPzic8ks5sJkcsgaJpZM4OKElN .

aaronjg commented 7 years ago

If I load the whole model into memory and then launch shinystan, I run out of memory, so I have to load the model, remove the extra parameters, either run GC() or serialize the object, restart R and reload, then launch shinystan. It's also nice when I am running the models on a remote cluster to be able to strip out the extra parameters and run it locally. I could create the shinystan object remotely and just copy that over though...

matweldon commented 6 years ago

I don't think we want to encourage mucking around with stanfit objects. Why aren't you specifying the pars argument when you call stan or sampling?

Sometimes you don't know how huge your stanfit object will be until after it's used 24 hours of cluster time (x 4 cores x 6 array jobs on different datasets) to sample. Benefit of hindsight etc. At which point it would be nice to have a utility to remove the warmup, perhaps thin the draws, and remove nuisance parameters just so that it doesn't take hours to download the stanfit files.

Not creating a specific method to do this encourages people to muck around with the stanfit objects themselves, which is more risky.