stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.59k stars 370 forks source link

Don't require two Stan programs to use standalone GQs (add a flag to services to turn off GQ block) #2934

Open jgabry opened 4 years ago

jgabry commented 4 years ago

Summary:

This came up when @rok-cesnovar and I were discussing the PR to add standalone generated quantities to CmdStanR:

Using standalone generated quantities requires the user to keep two versions of their stan program, one without GQs (for MCMC) and one with GQs (for standalone GQs). This seems both unnecessary and error-prone (having to make edits in two places).

I propose we add a flag in the services that would run the Stan program without generated quantities.

Current Version:

v2.23.0

mitzimorris commented 3 years ago

the problem is this: class stan/services/util/mcmc_writer.hpp has function write_sample_params - relevant part is here: https://github.com/stan-dev/stan/blob/53fd0144fdcb769e0eda44e968c3672af4f2dd90/src/stan/services/util/mcmc_writer.hpp#L97-L112

is always writes out parameters, transformed parameters, and generated quantities all in one go - note args hardcoded to true, true (line 111)

you would need to add 1 or 2 args to set of sampler service methods and insure that these are passed along to the generate_transitions method which calles write_array

relevant files are:

src/stan/util/run_sampler.hpp
src/stan/util/run_adaptive_sampler.hpp
src/stan/util/generate_transitions.hpp
src/stan/sample/fixed_param.hpp

(writing this is giving me serious deja vu)

mitzimorris commented 3 years ago

this use case would be a good one to use when working through proposals for better/more fine-grained I/O.

jgabry commented 3 years ago

Thanks @mitzimorris. So the idea would then be to run generate quantities but not write them out? Or am I misunderstanding (quite possible)? I was thinking that we could just ignore the whole block so they don't get computed at all.

mitzimorris commented 3 years ago

So the idea would then be to run generate quantities but not write them out?

the generated C++ code (i.e., stan.hpp) doesn't exactly line up with the named block structure of a Stan program, hence the confusion here - write_array writes parameter and transformed parameter values as well as running generated quantities block code and writing results.

you can pass in a flag which tells write_array to return instead of running generated quantities block.

let's make this concrete - here's the example bernoulli.stan code with addition of ppc checks in generated quantities block:

data { 
  int<lower=0> N; 
  int<lower=0,upper=1> y[N];
} 
parameters {
  real<lower=0,upper=1> theta;
} 
model {
  theta ~ beta(1,1);
  y ~ bernoulli(theta);
}
generated quantities {
  int y_rep[N];
  for (n in 1:N)
    y_rep[n] = bernoulli_rng(theta);
}

here's the generated write_array function which will:

a) transform theta back to constrained parameter space, add to vector vars__ - lines 24-31

b) (do the same transformed parameters - in the ex, nothing to do) - not executed is both boolean flags are false

c) compute generated quantities and write them out - lines 39-48 - not executed if 2nd boolean flag is false

     1    template <typename RNG>
     2    inline void write_array(RNG& base_rng__, std::vector<double>& params_r__,
     3                            std::vector<int>& params_i__,
     4                            std::vector<double>& vars__,
     5                            bool emit_transformed_parameters__ = true,
     6                            bool emit_generated_quantities__ = true,
     7                            std::ostream* pstream__ = nullptr) const {
     8      using local_scalar_t__ = double;
     9      vars__.resize(0);
    10      stan::io::reader<local_scalar_t__> in__(params_r__, params_i__);
    11      static const char* function__ = "bernoulli_ppc_model_namespace::write_array";
    12  (void) function__;  // suppress unused var warning
    13  
    14      (void) function__;  // suppress unused var warning
    15  
    16      double lp__ = 0.0;
    17      (void) lp__;  // dummy to suppress unused var warning
    18      stan::math::accumulator<double> lp_accum__;
    19      local_scalar_t__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());
    20      (void) DUMMY_VAR__;  // suppress unused var warning
    21  
    22      
    23      try {
    24        double theta;
    25        theta = std::numeric_limits<double>::quiet_NaN();
    26        
    27        current_statement__ = 1;
    28        theta = in__.scalar();
    29        current_statement__ = 1;
    30        theta = stan::math::lub_constrain(theta, 0, 1);
    31        vars__.emplace_back(theta);
    32        if (logical_negation((primitive_value(emit_transformed_parameters__) ||
    33              primitive_value(emit_generated_quantities__)))) {
    34          return ;
    35        } 
    36        if (logical_negation(emit_generated_quantities__)) {
    37          return ;
    38        } 
    39        std::vector<int> y_rep;
    40        y_rep = std::vector<int>(N, std::numeric_limits<int>::min());
    41        
    42        current_statement__ = 4;
    43        for (int n = 1; n <= N; ++n) {
    44          current_statement__ = 3;
    45          assign(y_rep, cons_list(index_uni(n), nil_index_list()),
    46            bernoulli_rng(theta, base_rng__), "assigning variable y_rep");}
    47        for (int sym1__ = 1; sym1__ <= N; ++sym1__) {
    48          vars__.emplace_back(y_rep[(sym1__ - 1)]);}
    49      } catch (const std::exception& e) {
    50        stan::lang::rethrow_located(e, locations_array__[current_statement__]);
    51        // Next line prevents compiler griping about no return
    52        throw std::runtime_error("*** IF YOU SEE THIS, PLEASE REPORT A BUG ***"); 
    53      }
    54      } // write_array()