Closed sakrejda closed 6 years ago
Thanks for writing this down. I however have a different thinking about the way to handle the within-iteration (per step and similar) reporting. In general the design as outlined seems to focus on data sent per iteration, and data sent once (except for the string logger). I don't think that's sufficient - my original use case (more info about divergent transitions) requires sending info only for some iterations. Some extensions I can think of (e.g. "almost divergent transitions") may also need to send multiple records per single iteration. I therefore think the design has to incorporate that there are different sets of data sent at differing frequencies. At least: per-iteration, per-step, per special condition (e.g., divergent transition).
While I'd guess that in practice, storing the full trajectory will likely be used mostly with restarting and for selected iterations, I don't see any advantage in designing it only for this use case. There simply should be a (compile-time or runtime) switch to record per-step information.
I also don't like having a write_trajectory
method as that implies that the sampler will need to store the per-step data in some intermediate in-memory storage. I'd rather go with multiple different "streams" of sampler state data with different frequencies, where one will store per-step info.
On the other hand I think that for all of the different writing frequencies we actually want to write a subset of: {iteration number, constrained params, unconstrained params, transformed params, gradients, momenta}, so there is a place for abstraction as write_sampler_state()
that is reused per-iteration, per-step and per special conditions. I am not sure if the subset should be the same for all frequencies or if we should allow separate configurations for each but I am slightly in favor of unified configuration. Although we might need to be able to write some additional information for the special "streams" (e.g. direction and tree depth for the per-step info, the difference between the Hamiltonian at the start of the trajectory and at the end when also reporting some sort of "approaching divergence, but not divergent yet" steps for divergence diagnostics).
Also I would add that it might be sensible to also pass transformed data
once at the beginning of the algorithm. The structure should IMHO follow similar signatures as the gen. quants. (both are mix of double and int).
@martinmodrak If you look at the code we go through contortions to send things at the same time so there definitely is a need to allow sending separately. On the writer end things can be combined but it doesn't make sense to enforce that on the C++ side.
I don't see in your write-up what you think does make sense? I can go look at where this data becomes available but I thought you'd have some specific suggestions.
re: transformed data, sure we could add that but it's not necessary while we're doing a refactor and it follows the same pattern as GQ so I'm punting for now.
there definitely is a need to allow sending separately
I don't really follow your thoughts here. What "sending" are you referring to? Do you think there is an actual advantage in buffering data on the C++ side before sending to writer/interface? Or are you speaking about bundling the various types of data (sampler params, constrained + unconstrained params ,gen quants) together?
In general, I think it makes sense to stream things as they become available (e.g. spit out per-step or per-divergence info immediately in the code that detects them) and only somehow "label" where they belong (per iter, per step, per divergence,. ..) which could be done by calling a specific relay method or additional parameter to fixed relay method. This should IMHO result in the least invasive changes to the sampler code. If there is a desire for buffering, it should be handled within the relay or within the interface-specific implemenation.
By "sending" I mean the point in the algorithm C++ code where you write relay.write_some_stuff(stuff);
, from reading your second paragraph I think we agree on "send as it becomes available". Can you point me to where you would do this for the per-trajectory info?
For NUTS, I would call the relay from base_nuts::build_tree
- both for per-step (trajectory) info and for per-divergence info.
Transformed data is like GQ and like parameters, but only gets output once, not once per iteration.
What is in scope here? If it's just a refactor, wouldn't that take out any trajectory information?
Even if the first version is just a refactor, we need to make sure the design will be able to accomodate these things (output that's irregular [only some iterations], output that's multiples on one iteration, and output that's once per run).
I agree. Buffering should be a filter, not built into the API.
On Jun 25, 2018, at 3:00 AM, Martin Modrák notifications@github.com wrote:
there definitely is a need to allow sending separately I don't really follow your thoughts here. What "sending" are you referring to? Do you think there is an actual advantage in buffering data on the C++ side before sending to writer/interface? Or are you speaking about bundling the various types of data (sampler params, constrained + unconstrained params ,gen quants) together?
In general, I think it makes sense to stream things as they become available (e.g. spit out per-step or per-divergence info immediately in the code that detects them) and only somehow "label" where they belong (per iter, per step, per divergence,. ..) which could be done by calling a specific relay method or additional parameter to fixed relay method. This should IMHO result in the least invasive changes to the sampler code. If there is a desire for buffering, it should be handled within the relay or within the interface-specific implemenation.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
From @bob-carpenter, re "what is in scope here?" and:
Even if the first version is just a refactor, we need to make sure the design will be able to accomodate these things (output that's irregular [only some iterations], output that's multiples on one iteration, and output that's once per run).
Yes this is the goal. First PR's should be primarily refactor so we can avoid too many interface-side changes, but we need the design to include:
I'll edit the issue to address these things.
A few comments, apologies if they mix discussions from this email thread and the corresponding Discourse thread.
The way we laid out the code the services API was designed to be the only
API.
In other words, there is no generic C++ API. The contract is that the
client provides
a model instance and the appropriate callbacks (mostly writers) and calls a
given
service route and the API takes care of the rest. All of the code not in
the
services
folder is internal to the API. The current difficulty in
passing a complete
configuration through these service routes makes the API not particularly
pleasant
to use, but regardless that was the intended design.
From that perspective, the operative question here is whether to replace the writer callbacks with some other form of callback. The proliferation of writers is indeed a mess, but we have to be careful in possible replacements. For example, monolithic callbacks capable of using algorithm-specific information would either have to be aware of the entire API, and hence break the API abstraction. Or we'd have to define callbacks unique to each service route which would be its own maintenance mess.
That said, there is only so much information that the algorithm can pass and there are some relatively natural abstractions. Service routes can supply
(1) can be handled with a logger capable of handling different states of output. (2) can be handled with a key-value abstraction. (3) and (5) can be handled with loggers or key-values depending on the specific information. (4) can be handled with a monolithic vector paired with a configuration vector of names and possibly categories for what part of the state a given element belongs (unconstrained, constrained, transformed, generated, algorithm, etc).
Ultimately we don't need to define custom callbacks for each algorithm, nor would be really want to. Instead I think that the proper abstraction here is a communication callback pattern capable of handling text messages with caution level, key-value pairs, and state information. In particular, this pattern would allow algorithms to be added and updated without having to modify the writer callbacks defined in the interfaces, and vice versa, a good indication that it's a step towards the right abstraction.
I played around with a very preliminary version of this a while back but it was never a priority. In any case, it's relatively straightforward to define a callback scheme and put together an exemplar C++ implementation.
On Tue, Jun 26, 2018 at 1:26 PM, Krzysztof Sakrejda < notifications@github.com> wrote:
From @bob-carpenter https://github.com/bob-carpenter, re "what is in scope here?" and:
Even if the first version is just a refactor, we need to make sure the design will be able to accomodate these things (output that's irregular [only some iterations], output that's multiples on one iteration, and output that's once per run).
Yes this is the goal. First PR's should be primarily refactor so we can avoid too many interface-side changes, but we need the design to include:
- transformed data
- buffering as a filter
- irregular output
I'll edit the issue to address these things.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2557#issuecomment-400273610, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlrVtLY2FCxwHMimy8bTFtY7YfSUbks5uAhp1gaJpZM4U1IpD .
A few comments, apologies if they mix discussions from this email thread and the corresponding Discourse thread.
- C++ API vs Services API
I'm pretty sure everyone agrees that the services API should be the single point of contact for the interfaces.
Even if the sole client of a lower-level C++ API is the services API, it'll still be worth writing the code this way so that the basic functions can be tested. It's a huge pain to test math functions through indirect I/O interfaces, callbacks, etc. That's why I suggested putting some of these functions in the math library, but I've been convinced that because they're primarily estimators whose behavior we might change, they should instead be treated like our algorithms rather than like our math lib.
- Write Refactor/Replacement
For example, monolithic callbacks capable of using algorithm-specific information would either have to be aware of the entire API, and hence break the API abstraction. Or we'd have to define callbacks unique to each service route which would be its own maintenance mess.
Exactly my objection to CmdStan and why I wanted to break services into individual functions (I'm not sure if there's a one-one relation between what I'm calling services functions and what Michael's calling a "route").
Each such function will get its own custom writer made up of a few shared components that can have common implementations.
This is also the plan for simplifying the service functions, but that's a separate topic.
That said, there is only so much information that the algorithm can pass and there are some relatively natural abstractions. Service routes can supply
- Information, warning, or error messages meant to be directed to the user
- Configuration information
- Initialization information
- State information, including the latent unconstrained parameter space, transformed parameters, generated quantities, and algorithmic state.
- Termination information.
(1) can be handled with a logger capable of handling different states of output.
I think everyone agrees about this. And that this is the thing where we won't need to worry about backward compatibility.
(2) can be handled with a key-value abstraction.
Yes, but it's non-trivial in C++. Are there types or is everything a string? Do we use variant types to make everything type safe or some kind of C-style dynamic cast thing like underlying the config for CmdStan?
(3) and (5) can be handled with loggers or key-values depending on the specific information.
I'm not sure what initialization information is. Does that mean things like the initial parameter values? If that is just logging and key-values, then everything is!
(4) can be handled with a monolithic vector paired with a configuration vector of names and possibly categories for what part of the state a given element belongs (unconstrained, constrained, transformed, generated, algorithm, etc).
I couldn't quite follow that. Is the output some kind of tagged vectors, i.e., more key values? Or is it like a database table with headers and streaming? I think this all needs to be handled in such a way that the regular draws can be sent to one place, unconstrained draws to a second place, and diagnostics to a third, along with things like adaptation, which go yet other places.
Ultimately we don't need to define custom callbacks for each algorithm, nor would be really want to.
I agree. There are only a handful of possible callback types after you unpack the complexity in (4). Each algorithm uses a custom combination of simpler loggers and writers so that the interfaces can have clean low-level implementations of these.
Instead I think that the proper abstraction here is a communication callback pattern capable of handling text messages with caution level, key-value pairs, and state information.
That's not very specific, but I think it's in line with Krzysztof's design goals and how Martin's already started coding.
In particular, this pattern would allow algorithms to be added and updated without having to modify the writer callbacks defined in the interfaces, and vice versa, a good indication that it's a step towards the right abstraction.
I don't think we can update without modifying callbacks. If we want to add per-iteration trajectory information, the interfaces have to figure out somewhere to put that. Or are you saying everything just gets buffered into some kind of giant object on the interface side? I'd think things would have be streamed out into different files, for example. That isn't going to happen by magic.
I also think new algorithms are a fine place for interfaces to cobble together the relevant callbacks. This isn't hard work.
- C++ API vs Services API
I'm pretty sure everyone agrees that the services API should be the single point of contact for the interfaces.
Even if the sole client of a lower-level C++ API is the services API, it'll still be worth writing the code this way so that the basic functions can be tested. It's a huge pain to test math functions through indirect I/O interfaces, callbacks, etc. That's why I suggested putting some of these functions in the math library, but I've been convinced that because they're primarily estimators whose behavior we might change, they should instead be treated like our algorithms rather than like our math lib.
- Write Refactor/Replacement
For example, monolithic callbacks capable of using algorithm-specific information would either have to be aware of the entire API, and hence break the API abstraction. Or we'd have to define callbacks unique to each service route which would be its own maintenance mess.
Exactly my objection to CmdStan and why I wanted to break services into individual functions (I'm not sure if there's a one-one relation between what I'm calling services functions and what Michael's calling a "route").
Each such function will get its own custom writer made up of a few shared components that can have common implementations.
This is also the plan for simplifying the service functions, but that's a separate topic.
That said, there is only so much information that the algorithm can pass and there are some relatively natural abstractions. Service routes can supply
- Information, warning, or error messages meant to be directed to the user
- Configuration information
- Initialization information
- State information, including the latent unconstrained parameter space, transformed parameters, generated quantities, and algorithmic state.
- Termination information.
(1) can be handled with a logger capable of handling different states of output.
I think everyone agrees about this. And that this is the thing where we won't need to worry about backward compatibility.
(2) can be handled with a key-value abstraction.
Yes, but it's non-trivial in C++. Are there types or is everything a string? Do we use variant types to make everything type safe or some kind of C-style dynamic cast thing like underlying the config for CmdStan?
(3) and (5) can be handled with loggers or key-values depending on the specific information.
I'm not sure what initialization information is. Does that mean things like the initial parameter values? If that is just logging and key-values, then everything is!
(4) can be handled with a monolithic vector paired with a configuration vector of names and possibly categories for what part of the state a given element belongs (unconstrained, constrained, transformed, generated, algorithm, etc).
I couldn't quite follow that. Is the output some kind of tagged vectors, i.e., more key values? Or is it like a database table with headers and streaming? I think this all needs to be handled in such a way that the regular draws can be sent to one place, unconstrained draws to a second place, and diagnostics to a third, along with things like adaptation, which go yet other places.
Ultimately we don't need to define custom callbacks for each algorithm, nor would be really want to.
I agree. There are only a handful of possible callback types after you unpack the complexity in (4). Each algorithm uses a custom combination of simpler loggers and writers so that the interfaces can have clean low-level implementations of these.
Instead I think that the proper abstraction here is a communication callback pattern capable of handling text messages with caution level, key-value pairs, and state information.
That's not very specific, but I think it's in line with Krzysztof's design goals and how Martin's already started coding.
In particular, this pattern would allow algorithms to be added and updated without having to modify the writer callbacks defined in the interfaces, and vice versa, a good indication that it's a step towards the right abstraction.
I don't think we can update without modifying callbacks. If we want to add per-iteration trajectory information, the interfaces have to figure out somewhere to put that. Or are you saying everything just gets buffered into some kind of giant object on the interface side? I'd think things would have be streamed out into different files, for example. That isn't going to happen by magic.
I also think new algorithms are a fine place for interfaces to cobble together the relevant callbacks. This isn't hard work.
There actually are multiple layers of internal C++ API in Stan. Even though we haven't made any promises about these it's not like they're going to change anytime soon.
Instead I think that the proper abstraction here is a communication callback pattern capable of handling text messages with caution level, key-value pairs, and state information. In particular, this pattern would allow algorithms to be added and updated without having to modify the writer callbacks defined in the interfaces, and vice versa, a good indication that it's a step towards the right abstraction.
We've discussed this along the way, including Allen's (slightly extreme) suggestion of just making all the I/O JSON (which also allows you to add algorithms without having to modify the writer callbacks in the interfaces). I think it's a good set of abstractions to have internally.
What this approach doesn't communicate is how a user of the API should use the output of the algorithms. This leads to incompatible and rigid dependencies on odd stuff like the exact message formats in the interfaces. While the approach here is a work in progress, the relay classes should be relatively simple and straightforward to modify for new algorithms, internally they also will definitely be constructed from the abstractions you mention (key/value + logger + table, basically).
Going through the algorithm code now since I sort of assumed it would make sense to let the algorithm code individually write things (like it does now). It might make more sense to have a relay method that's a friend of algorithm classes for handling the stan::mcmc::sample
and stan::mcmc::base_nuts
. Not sure what's the right way to go., obv. easier to keep current usage in the algorithms.
The central logger thing is being done instead so this is dated now. Closing.
This issue lists the current outputs, and then a minimal set of signatures (those are the two sections below). After each sub-section there's a list of issues with implementing this, some of which I still need to check against the code.
Currently defined outputs
Services-defined:
double
values representing the time taken for $A$ algorithm stages (warmup, sampling in MCMC).Model-file defined:
Model-file defined for sampling:
Model-file defined, for HMC:
Model-file defined for optimization:
Model-file defined for ADVI:
Model-file defined, for whatever algorithm:
Algorithm-defined for HMC:
double
, per-iteration during warmup (fixed in sampling), so $N_w$Algorithm-defined for BFGS/LBFGS:
Algorithm-defined, for Newton optimization:
Algorithm-defined, for ADVI:
Signatures a
relay
needs to handleA relay is composed of writers, it's the only object passed into
services
to handle output.On construction the relay needs
std::vector<std::string> parameter_names
std::vector<std::string> transformed_parameter_names
std::vector<std::string> generated_quantity_names_r
std::vector<std::string> generated_quantity_names_i
The model-file-defined required signatures are:
relay.write_parameters(std::vector<double>);
relay.write_unconstrained_parameters(std::vector<double>);
relay.write_transformed_parameters(std::vector<double>);
relay.write_generated_quantities_r(std::vector<double>);
relay.write_generated_quantities_i(std::vector<double>);
relay.write_parameter_gradients(std::vector<double>);
Issues:
model.write_array
) so we should evaluate whether it's feasible to undo that in the sampling code.mcmc_writer.write_sample_params
collapses the algorithm-defined parameters and the model-file-defined parameters (lines 94, 95, 101). For the above signatures we would need to undo that and write the algorithm parameters to a separaterelay
signature.Model-file-defined for sampling, so in a sampling-specific sub-class
relay.write_parameter_momenta(std::vector<double>);
Model-file-defined for optimization, so in a optimization-specific sub-class
relay.write_hessian(Eigen::Matrixd hessian)
(CHECK: is this generated or just rstan? Q: should we generate this?)Model-file-defined for ADVI, ADVI-specific sub-class
relay.write_covariance(Eigen::Matrixd hessian)
Algorithm-related:
relay.write_iteration(int iteration);
relay.write_stage(std::string stage);
<- this should be, "warmup"/"sampling" for MCMC, could indicate different warmup stages within HMC with adaptation, could be "optimizing"/"converged" for optimization. Since we often want to know the timing information stage-wise the relay class could run a timer and record how long each stage took as well.Algorithm-defined, HMC:
relay.write_log_density(double);
relay.write_stepsize(double)
relay.write_tree_depth(int);
relay.write_acceptance_rate(double);
relay.write_energy(double);
relay.write_n_leapfrog(int);
relay.write_divergence(bool);
relay.write_mass_matrix(Eigen::Vectord mm);
<- @mitzimorrisrelay.write_mass_matrix(Eigen::Matrixd mm);
<- @mitzimorrisrelay.write_trajectory(???);
<- @martinmodrakIssue: These are actually grouped in various ways.
Algorithm-defined, BFGS/LBFGS:
relay.write_log_density(double);
relay.write_dx_magnitude(double);
relay.write_grad_magnitude(double);
relay.write_alpha(double);
relay.write_alpha_0(double);
relay.write_n_evals(double);
Issue: These are actually grouped in various ways.
Algorithm-defined, BFGS/LBFGS, CHECK THESE:
Algorithm-defined, Newton:
relay.write_log_density(double);
relay.write_delta_log_density(double);
Algorithm-defined ADVI:
relay.write_ELBO(double);
relay.write_delta_ELBO_mean(double);
relay.write_delta_ELBO_med(double);
relay.write_iteration_time(double);
// standardize? time_type?relay.write_eta(double);
(called once)Issue: These are actually grouped in various ways.
Issues:
Incorporating the logger, use paramater packs to make life easier:
relay.debug(...);
relay.info(...);
relay.warn(...);
relay.error(...);
relay.fatal(...);
Grouping within the algorithms:
HMC:
stan/mcmc/sample.hpp
has:const Eigen::VectorXd& q, double log_prob, double stat
sampler
types ... (?)Optimization
ADVI