xoopR / distr6

R6 object-oriented interface for probability distributions.
https://xoopr.github.io/distr6/
Other
99 stars 23 forks source link

Correct/most-efficient implementation of parameterisation #58

Closed RaphaelS1 closed 5 years ago

RaphaelS1 commented 5 years ago

distr6 lets the user choose distribution parameterisation by allowing the user to construct the distribution using any combination of common parameters. For example the Normal distribution:

Normal$new(mean = 0, sd = 1)
Normal$new(mean = 0, var = 1)
Normal$new(mean = 0, prec = 1)

In doing so, all other parameter combinations must be updatable. For example in using Normal(mean, sd) the parameters var and prec remain in the parameter set and are updated automatically using the functions sd^2 and sd^-2. And analogously when constructing with var or prec.

The problem:

Currently all the parameterisations are hard-coded with update functions into the constructor and the correct one is chosen depending on the constructed distribution. This is inefficient and somewhat messy. One alternative would be to name each ParameterSet for example NormVar, NormSd and NormPrec and then select the correct one in construction. Although this may add an unnecessary layer of abstraction...

fkiraly commented 5 years ago

Yes, this is an interesting question - sorry that we haven't had the time yesterday to discuss this through.

I know of no precedent where this would be solved well, but it probably exists since parameterization is a general problem in scientific computing.

Here's one suggestion (not necessarily the best one):

  1. settle on a "default" parameterization, say, mean plus variance for normal
  2. define the following functions: getSettableParamsFrom that takes as an argument a named list, of parametername-value-pairs. The names can be different, but the return value is always a named list of parametername-value-pairs where the parameters in question are the "default" parameters. For example getSettableParamsFrom(mu = 1, std = 2) would return list(mu = 1, var = 4). Arguments should always be a full specification of the parameter set.
  3. Then, setparams is always called with default parameters. If setparams is called with "non-default" parameters, it calls getSettableParamsFrom first.
  4. the same behaviour in the constructor.

There's one downside to the above: it does not allow changing sub-sets of "non-default" parameters that map bijectively onto subset of "default" parameterse, e.g., change standard deviation, or variance, in isolation. This could be added in, in that getSettableParams, on occasion returns a list with only some default parameters but not all.

However, this cannot be generalized to a case where there is not a subset correspondence, as there are cases where the setting of parameters depends on a choice of "what to leave fixed". Consider as an example the case where you have 10 parameters a0, a1, ..., a9, all of them non-negative and adding up to the number one. This could also be parameterized by sums of any pairs of the ai. It's ill-defined to say "change [a0 + a1] to 0.5".

RaphaelS1 commented 5 years ago

Okay so if I understand this correctly, the following would occur:

  1. User calls Normal$new(mean = 0, sd = 1)
  2. Internally constructor calls private$.getSettableParamsFrom(list(mean=0, sd=1)). This function internally runs something like
    sd2var = function(sd) return(sd^2)
    var = sd2var(1) 
    return(list(mean = mean, var = var))
  3. That return is used to construct parameters of the distribution.

My questions then:

  1. What about other possible parameterisations? e.g. using precision. The original idea was to allow the user to see the values of all possible parameters, even if not settable.
  2. Following from 1), if we allow multiple parmeterisations then internally getSettableParamsFrom needs something like sd2var, prec2var,var2sd,var2prec, etc. If we want to show the values of each in the ParameterSet
  3. In using this, can we remove the idea of a settable parameter? If the user can pass any parameter to setParameterValue and then getSettableParamsFrom will adjust accordingly, then we don't need to restrict the user to which parameters they can set.
fkiraly commented 5 years ago
  1. the user would always be able to see the possible parameters (via get), but internally, setting, is always reduced to setting the default parameters.
  2. Yes - or dispatch-on-argument, in which you have different conversion methods, depending on argument - similar how you can call R modelling methods upon a formula, or an explicit data model specification, e.g., the option to call lm with different arguments. This may be a bit cumbersome with S3 (bit of manual coding), but is doable with S4 or R6, I believe.
  3. No, I think the information which parameters are settable is still valuable, since it defines which parameters the interface passes to the outside - e.g., if you use the distribution in inference, tuning, etc. It's a "CS-modelling" statement about the scientific type, rather than a pure interface functionality (in which case it would be obsolete, as you conclude).
RaphaelS1 commented 5 years ago

No, I think the information which parameters are settable is still valuable, since it defines which parameters the interface passes to the outside - e.g., if you use the distribution in inference, tuning, etc. It's a "CS-modelling" statement about the scientific type, rather than a pure interface functionality (in which case it would be obsolete, as you conclude).

But then perhaps "settable" is the wrong term. To me "settable" implies that the user isn't able to call setParameterValue(param) if param is not settable. But in the above scenario they can call setParameterValue on any parameter (even if not used in construction)

fkiraly commented 5 years ago

Not sure. However, this question highlights an important point, namely two distinct cases, where (a) the user interacts with the distribution (b) some other machinery, e.g., estimator or learning machine, accesses the distribution.

In case (a), I think one can make a case that it should behave as a user naive to the interface but expert in the methodology would expect. That is, it would set all the parameters while the "settables" remain as they are. In case (b), you need some flag for a learning machine which parameters it should consider the distribution as parameterized in.

RaphaelS1 commented 5 years ago

In case (b), you need some flag for a learning machine which parameters it should consider the distribution as parameterized in.

However as all the formulae are based on the default parameterisation (as per your first argument), then arguably it doesn't matter what a learning machine does; as settable refers only to "can this parameter value be updated". Which is not the same as "can this parameter value be estimated" i.e. fittable

fkiraly commented 5 years ago

Regarding (b), there can be a difference in what an estimator/learner does algorithmically with a distribution, depending on the parameterization. Example: gradient-descent based MLE for fitting parameters, e.g., whether you do this in terms of variance, precision, or log-variance.

RaphaelS1 commented 5 years ago

Understood. My concern is as follows, whilst I agree the current parameterisation method is messy and could benefit of an abstraction of the ParameterSet from the constructor, there is a physical limit to how much we can abstract, more precisely: due to R6 limitations a distribution's ParameterSet has to be in the constructor and not as a public variable. This means that in the same constructor there has to be a set of conditionals identifying which parameters are "settable" based on how the user constructed the distribution (again this has to be the case as R6 doesn't handle overloading of constructors). Therefore I'm not sure if a getSettableParams method adds anything as we still have the same problem

fkiraly commented 5 years ago

It would be a generic - in construction you set once which parameters are settable, these may not be the "default" ones. Internally, this is always converted to a representation in terms of the "default" ones.

The added value is that this design allows the same object to behave, seen from externally, like "distribution with settable parameter set [theta]" for different theta, such as mean/variance, or mean/precision.

How it behaves (i.e., which theta) would be determined in construction. Internally, you would always have conversion to the default parameter set.

RaphaelS1 commented 5 years ago

Okay.. so semi-pseudo-code for the above as follows:

In the Normal class we have the private method:

.getSettableParams <- function(mean,var,prec,sd){
 if(!is.null(var)) return(list(mean=mean,var=var))
if(!is.null(prec)) return(list(mean=mean,var=prec^-1)
etc.
}

Then the constructor:

initialize <- function(mean,var,prec,sd){
parameterset <- function(...){
if(!is.null(var)) var = settable, prec = not_settable, etc.
}
setParameterValue(.getSettableParams(mean,var,prec,sd))
}
fkiraly commented 5 years ago

Yes - plus possible input checks, in case the user doesn't pass a mean, say, or multiple contradictory values for dispersion parameters.

RaphaelS1 commented 5 years ago

Okay, agreed. I think getSettableParams should be a private method that only required distributions have (as opposed to S3).

RaphaelS1 commented 5 years ago

Writing this up I'm not sure it is actually more efficient. Currently ParameterSet is quite fast and efficient (aside from the constructor) problem because of the updateFunc argument and the update method that dynamically updates based on any parameters. In the suggest method above, each time a parameter is updated we have to call getSettableParams which has to choose the correct functions to update other parameters and then all the validation checks on the given arguments (e.g. class, boundaries) are performed.

RaphaelS1 commented 5 years ago

A better compromise may be to have .getParameterSet which stores all the possible variants, with settable and updateFunc included and then this is called as one line in the constructor, and all the checks are performed within this.

fkiraly commented 5 years ago

Hm, @PeterRuckdeschel @stamats - any good suggestions or lessons learnt from your ParamFamily design?

RaphaelS1 commented 5 years ago

I have just implemented the following in the Normal distribution:

There is no logical flag for the reference parameter, e.g. no "reference" col in the paramset with TRUE/FALSE as elements, but I think this is probably easy to deduce by looking at "updateFunc".

Finally added an option "verbose" so users can turn off messages that appear in construction telling the user which parameterisation is used.

How does this sound?

fkiraly commented 5 years ago

Makes sense to me - let's see whether it's easy to maintain and/or easy to use.

PeterRuckdeschel commented 5 years ago

Sorry for jumping in late -- you indeed hit a very interesting topic where a good design could really pay off later on.

I think the way you address the topic is fine. Actually, also the idea to have a default / reference parametrization which is used by all setters is a good concept.

To solve the issue that the information given as argument is not yet sufficient to identify the parameter could be solved as follows: starting from a reference (complete) parameter expressed in the reference parametrization you could then allow to only change those aspects in the reference parametrization which are affected by the setter. Doing so you indeed can use polymorphism / method dispatch as is done in S3 and S4 methods to allow setters to react on different named arguments.

Second I would suggest that you should try to allow to (optionally) bind gradient information to the setter: If the reference parametrization is expressed in terms of theta and you have a mapping tau(theta) which (in some possibly different parametrization) moves theta to a parameter theta' (in reference parametrization), for gradient descent type algorithms you might want to have information on its derivative tau'(theta), or if 2nd order algorithms are to be used, even the Hessian tau''(theta) could be helpful.

PeterRuckdeschel commented 5 years ago

Some of this is covered by the concept of partial influence curves (see chap. 4/ Def. 4.2.10, Rieder (1994): Robust Asymptotic Statistics. Springer) which are meant to cover the situation that one is interested in the (smooth) aspect tau(theta) where theta is some arbitrary smooth transformation of tau with derivative tau'(theta). In distrMod, this is captured by slot trafo in S4 class "ParamFamParameter".

fkiraly commented 5 years ago

Sorry, crossing posts - this is referring to @RaphaelS1 's earlier post (pre- @PeterRuckdeschel )

Hm, good point - this is an inconsistency that may bother users in the rare (?) case they construct a distribution "this" way and want to change them "that" way.

Do you think we should prevent users from doing that, so both users and other parts of the interface are prevented/allowed changing parameters the same way?

I'm wondering though whether we wouldn't be introducing a source of botherance in order to remedy a subjective design inconsistency?

PeterRuckdeschel commented 5 years ago

Just a little comment from "outside":

I think the "source of botherance" Franz mentions is worth it -- a good idea would probably be to have a clearly communicated standard setter, but besides you should also provide alternative setters, even if they come with extra efforts through checking overheads, finding the right mappings etc.

Let me try to convince you that this is not a subjective design inconsistency but will happen quite frequently, whenever you consider functions tau(theta) instead of theta itself. With this in mind, it is a natural situation, and you want to be able to deal with it.

As an example look at the situation where you want to have a recursive update procedure for aspects tau(theta), or even more concrete, you want a recursive scheme for updating parametric normal quantiles. The reference parameter space is built by theta=(mu, sigma), mu in R, sigma >0, the parametric quantile would be q=tau(theta)=mu+sigma*q0, where q0 is the (known) quantile of N(0,1), mu, sigma in the end need to be estimated from data.

For whatever reason you have noticed a shift in the quantile from q to q' and want to move your distribution accordingly.

To see what happens, it helps to linearize tau about the starting theta0=(mu0,sigma0), i.e., tau(theta)=tau(theta0)+ d/dmu tau(theta0) (mu-mu0)+d/dsigma tau(theta0) (sigma-sigma0) + ignored remainder. On first sight, as you are only interested in changes in the quantile, this only seems to affect the subspace (of the reference parametrization) induced by the difference Delta(tau(theta)) from one quantile estimate to the next one. But then, it may easily happen that the new combination [changed aspects in reference parametrization induced by Delta(tau)] x [unchanged aspects in reference parametrization] is no longer admitted in the parameter space (in our case, e.g. meaning that you cross 0 for sigma), so changes induced by Delta(tau) may trigger further changes in the kernel space of Delta(tau).

Other natural examples with similar issues cover multivariate normals where you either have the covariance matrix as parameter or the standard deviations and the bivariate correlations, or, third, for model selection, it is not uncommon to parametrize this distribution by the inverse covariance matrix (as 0 in there indicate conditional independence in a graphical model).

So all painful features Raphael mentions i.e., choose the correct functions to update other parameters and then all the validation checks on the given arguments (e.g. class, boundaries) are not signs of bad design but arise in a natural way.

RaphaelS1 commented 5 years ago

Hi @PeterRuckdeschel , your comments are very much appreciated, especially as we finalise design choices that we don't want to regret down the line, such as this!

I believe what I have now implemented covers the basics as suggested, however my concern is whether the current method is overly simple.

The method works in three parts:

  1. Every distribution has default/reference parameters
  2. Every distribution has 'settable' parameters
  3. Every distribution ParameteSet shows all possible parameterization which update accordingly

These are all linked by the functions: getParameterSet, getSettableParams, update and setParameterValue. I believe these should aptly cover your example above but perhaps @fkiraly you could confirm this if/when you've had time to look at the code?

PeterRuckdeschel commented 5 years ago

I am completely fine with this -- it is both efficient and consistent. The only point I was trying to make is to allow for other setters, if needed, and that your infrastructure can cope with these alternative setters later on.

fkiraly commented 5 years ago

@PeterRuckdeschel I think we disagree about whether we disagree: my comment about botherances was only about whether non-settable parameters should still be settable by the user or locked to the user.

Just to be clear, I think we don't disagree on key requirements and high-level interface.

I agree with you, and @RaphaelS1 (and indeed suggested sth along these lines earlier in this thread?) that distributions should be accessible through different parameterizations, but of course the user/API needs to state clearly which parameterization to consider the distribution in.

fkiraly commented 5 years ago

@RaphaelS1 , where are getParameterSet, getSettableParams, etc?

RaphaelS1 commented 5 years ago

(and indeed suggested sth along these lines earlier in this thread?)

Credit you fully with this design!

getParameterSet is a script called getParameterSet.R with the generic/dispatch methods. update is in ParameterSet.R. setParameterValue is in ParameterSet.R with specific distributions overloading this. getSettableParams is a private method in each class definition