virtual declarations missing for `get_constrained_sizedtypes()` and `get_constrained_sizedtypes()` in `model_base`

andrjohns commented 1 year ago

Summary:

stanc outputs methods for returning a json-formatted string of the parameter metadata, like the following for the bernoulli model:

  inline std::string get_constrained_sizedtypes() const {
    return std::string("[{\"name\":\"theta\",\"type\":{\"name\":\"real\"},\"block\":\"parameters\"}]");
  }
  inline std::string get_unconstrained_sizedtypes() const {
    return std::string("[{\"name\":\"theta\",\"type\":{\"name\":\"real\"},\"block\":\"parameters\"}]");
  }

But these methods don't have corresponding virtual methods in src/model/model_base.hpp

Current Version:

v2.31.0

WardBrian commented 1 year ago

I’m not sure why these methods exist in the generated model to be honest, since nothing currently seems to use them.

Do you have a use in mind for them?

andrjohns commented 1 year ago

I was after a way of getting the dimensions of the unconstrained parameters, and get_unconstrained_sizedtypes() looks like the only option

andrjohns commented 1 year ago

Another option could be to parse the output of unconstrained_param_names(), but also open to other suggestions if there's something obvious I'm missing!

bob-carpenter commented 1 year ago

I was after a way of getting the dimensions of the unconstrained parameters

What's the eventual goal?

I missed when those functions got added and would not have approved. I don't see the point of get_constrained_sizetypes() (or the prefix "get_"), as it just returns a string encoding name/size pairs. It'd be better to return a pair consisting of a std::vector<std::string> of strings and a std::vector<size_t> of sizes.

WardBrian commented 1 year ago

They were added back in 2019 but never exposed or used anywhere: https://github.com/stan-dev/stanc3/pull/226

Using std::vector<size_t> for objects is more or less singlehandedly holding up the tuples PR at this point, so avoiding using that is a win in my eyes.

The JSON these return is much richer anyway since it lets you distinguish matrix[3,2] and array[3,2] real, for example, and is probably the part of the compiler which was most ready for tuples/non-rectangular types, and would probably be useful for cmdstanpy/r when it comes time to process CSV outputs which contain tuples

andrjohns commented 1 year ago

What's the eventual goal?

I'm updating the handling of unconstrained pars in cmdstanr to be structured using the draws formats from posterior (motivation in this issue).

For that I need to know their dimensions - given that they can differ from the constrained parameter dimensions returned by get_dims()

bob-carpenter commented 1 year ago

I wrote back on the original issue asking why. I don't understand the motivation and have the same concerns as @mitzimorris expressed about the proposed solution.

I've always found the R devs to have almost diametrically opposed preferences to mine, which is for minimal but composable (i.e., unix-like) APIs. For example, I prefer the simple direct manipulations of NumPy to the do-it-all functions of the Tidyverse.

andrjohns commented 1 year ago

I wrote back on the original issue asking why. I don't understand the motivation and have the same concerns as @mitzimorris expressed about the proposed solution.

Note that the proposed solution from the linked post (a named list) has since been eschewed in favour of the draws formats - which is what the results of sampling/optimisation/etc (i.e., the constrained parameters) currently use. In other words, it's aligning the APIs for the constrained and unconstrained parametetrs.

At a higher level, the motivation is that there are multiple use-cases which need to operate on the unconstrained parameters (e.g., moment-matching, power-scaling, plotting). Users/packages currently need to manually track the indexes of particular parameters in the single unlabeled numeric vector of values when they're performing operations or constructing their own unconstrained parameter inputs - not very user-friendly or robust to error.

Additionally, the current single numeric vector structure isn't terribly amenable to a workflow operating on multiple iterations across multiple chains.

Given that we have an existing API in the draws format for operating on parameters from multiple chains, as well as access to the model functions for unconstrained parameter dimensions, it seems like a good quality-of-life improvement for users

andrjohns commented 1 year ago

And also to clarify that the proposed solution in the linked post was using the existing structure for specifying initial values that is used by both rstan and cmdstanr

stan-dev / stan

virtual declarations missing for `get_constrained_sizedtypes()` and `get_constrained_sizedtypes()` in `model_base` #3175

Summary:

Current Version: