xoopR / distr6

R6 object-oriented interface for probability distributions.
https://xoopr.github.io/distr6/
Other
99 stars 23 forks source link

Implementation of Conditional, Compound, Truncated/Huberized and Lebesgue Distributions #1

Closed RaphaelS1 closed 5 years ago

RaphaelS1 commented 5 years ago

Should these be implemented as: i) Inherited classes ii) Composite classes iii) Wrappers (i.e. composite classes without being formally defined as a new class, identified via some sort of flag, e.g. Truncated=TRUE)

RaphaelS1 commented 5 years ago

Truncated/Huberized distributions (as far as I am aware) affect a few statistical functions but generally are implemented in the same way as any other distribution, hence a wrapper should suffice that simply marks whether the distribution has been truncated/huberized and then the p/r/q/d functions are adapted as required by the wrapper.

Similarly Mixing Distributions could also be covered by a wrapper/composite-class that adds the additional variables 'distributions' and 'weights', then wraps p/d/q/r as required. Although other properties/traits will change so perhaps a new class is preferred?

For conditional distribution the 'best' way to do this is still open to question. This will depend on how much information is required to be kept on both distributions and which methods/variables will have to be adapted

RaphaelS1 commented 5 years ago

Copied from @PeterRuckdeschel's email regarding conditional distributions:

Well the wrapping should do: Essentially two routes could be viable: (1) in a wrapping strategy, similar to our distrMod approach, implement conditioning as a particular "parametrization" and then similar to our models separate distributions and conditions (as we do with parameters and distributions) and in addition provide maps condition-> distribution, so that you do not need to provide new distribution functionals.

(2) (which is our current way in distrEx) add additional infrastructure for all distributional operations (p,d,q,r; E(), sd(), var, other functionals) which are able to digest additional information/arguments provided by a condition. Here as to performance I have no clear cut opinion, but would tend to say, that in contrast to models where the parameter value does not change too often, so need not be evaluated too often, in our applications, we needed to have the condition to be evaluated on a large set of possible values which is why I would think that (2) has some advantages.

PeterRuckdeschel commented 5 years ago

As to Truncate/Huberize -- these are not generating new classes -- they are simply methods which generate new distributions from old ones.

Just a comment why (some) mixtures should be some class of their own: Look at UnivarLebDecDistributions. This class really is needed as many innocent operations are not closed for discrete or abscont distributions, e.g. consider XY, X a.c. and Y discrete with P(Y=0)>0. To see that we also may need mixtures of more than just two components, see how we deal with the image law of XY when the distributions of X and Y are given -- we write X as a mixture of the 5 components X.ac.+, X.ac.-, X.d.+, X.d.-, X0 (and the same for Y) where X.ac., =+,- is the abscont part and X.d.,=+,- the discrete part of the r.v. and X..+ is L(X. | X. > 0), X..- is L(X. | X. < 0), =ac,d, and X0 = L(X|X=0) and then on the absolute values of X.ac. and X.d.* write |U| |V| as exp(log(|U|)+log(|V|)) and for the sum of logs we use our convolution.

PeterRuckdeschel commented 5 years ago

sorry not yet quite familiar with the issue tracker; my comment was not meant as the final word on this...

RaphaelS1 commented 5 years ago

I think sticking with the distr approach for Truncate/Huberize is sensible. These are wrappers of the form Truncate(dist, lower, upper) and Huberize(dist, lower, upper). However I think adding two composite classes (that inherit from Mixing/Conditional models and the original distribution as required), TruncatedDistr or HuberizedDistr, has the added advantage of giving access to the original class properties but overloads any required functions (e.g. p/d/q/r/mean,...) in construction.

In regards to the mixing distributions and LebDec, I agree and I'll take another look at your documentation specifically for these to determine how to carry them forward to R6.

PeterRuckdeschel commented 5 years ago

I think sticking with the distr approach for Truncate/Huberize is sensible. These are wrappers of the form Truncate(dist, lower, upper) and Huberize(dist, lower, upper). However I think adding two composite classes (that inherit from Mixing/Conditional models and the original distribution as required), TruncatedDistr or HuberizedDistr, has the added advantage of giving access to the original class properties but overloads any required functions (e.g. p/d/q/r/mean,...) in construction.

fine with this -- note that in distr, we already expose the distr properties of the mixture components in distr for both class UnivarMixingDistribution and class UnivarLebDecDistribution: If MD is a UnivarMixingDistribution, mixDistr(MD)[[i]] will give you the i-th mixture component (as a univariate distribution) and mixCoeff(MD)[i] will give you the weight of the i-th component. Similarly, for LD a UnivarLebDecDistribution, discretePart(LD) and acPart(LD) return the discrete [a.c.] part of LD (as DiscreteDistribution [AbscontDistribution]) and discreteWeight(LD), acWeight(LD) the respective weights. In addition d.ac, p.ac, r.ac, q.ac and p.discrete, r.discrete, d.discrete, q.discrete are convenience wrappers to s(discretePart(LD)), and s(acPart(LD)), s one of r,d,p, and q.

In regards to the mixing distributions and LebDec, I agree and I'll take another look at your documentation specifically for these to determine how to carry them forward to R6.

fkiraly commented 5 years ago

My suggestion would be as discussed yesterday:

Truncate, Huberize, etc, are classes inheriting from distributions, and are wrappers-by-constructor, or compositor-by-constructor - similar to how GridSearchCV in scikit-learn behaves.

The "wrapped" distribution sits in a dedicated parameter field, and there is "parameters from truncate/huberize" and "parameters from wrapped distribution", that can be called, and accessed jointly - in a way that is similar to, or identical to mlr's or sklearn's composite parameter interface.

RaphaelS1 commented 5 years ago

To confirm I understand @fkiraly's suggestion, by example: a decision tree in mlr has the following class structure classif.rpart -> RLearnerClassif -> RLearner -> Learner (where '->' means 'inherits from'). A decision tree in a bagging wrapper has the following: BaggingWrapper -> HomogeneousEnsemble -> BaseWrapper -> Learner. Applying a tuning wrapper to this gives: TuneWrapper -> OptWrapper -> BaseWrapper -> Learner.

The BaseWrapper class adds the next.learner method to every wrapper which calls the original (unwrapped) object. Then the next child class (in this case Ensemble or Opt) adds further methods specific to ensembling/tuning.

In our case the structure would be something like: Distribution <- WrappedDistribution <- TruncatedDistribution Distribution <- WrappedDistribution <- HuberizedDistribution Distribution <- WrappedDistribution <- MixingDistribution Distribution <- WrappedDistribution <- MixingDistribution <- LebDecDistribution

I am unsure if HuberizedDistribution should be a sub-class of MixingDistribution or not as this is only true in some cases. I am also unsure if LebDecDistribution should be a sub-class of WrappedDistribution or MixingDistribution.

Arguably we could then also add Distribution <- WrappedDistribution <- ConditionalDistribution

PeterRuckdeschel commented 5 years ago

Thanks for this clarification.

Yes, the inheritance structure Distribution <- WrappedDistribution <- MixingDistribution <- LebDecDistribution is fine (and in fact very close to ours).

As to TruncateDistribution, HuberizedDistribution I am not sure whether we need these special classes at all, as the return value of such an operation is not really so different from a "standard" LebDecDistribution and as of now I do not really see a benefit where it would pay off to treat a Hunberized/Truncated Distribution different to a LebDecDistribution in a subsequent method dispatch. What we do need is a dispatched method Truncate / Huberize to produce truncated/ huberized distributions, because the operation of truncation/huberization is somewhat different for discrete, abscont and mixed distributions.

fkiraly commented 5 years ago

Hm, I'm not sure whether a generic WrappedDistribution adds anything, since the user would only want to use Truncate, Huberize, etc. Unless of course there are methods you can usefully inherit, but I'd see that as the only case for the generic "Wrapped". @RaphaelS1, why would you like a generic Wrapped?

Regarding mixtures, it may be worthwhile to think carefully about whether you want to separate this into type homogenous and type inhomogenous mixtures. Note that Huberized is a specific kind of type inhomogenous mixture.

RaphaelS1 commented 5 years ago

@fkiraly the main advantage of the generic wrapped is as they do in mlr, to have a generic function that gives access to the original distribution. I agree most of the 'original' methods are altered once truncated/huberized but perhaps some of the original properties/traits are still informative? I think a generic wrapped only makes sense if we can think of wrappers that my be implemented in the future that would have benefitted from the original infrastructure, as building several distinct wrappers could quickly become complicated. And I believe (@PeterRuckdeschel or @stamats please correct me if I'm wrong) that in distr, mixtures are separated as you suggest.

@PeterRuckdeschel The only added benefit of the TruncateDistribution and HuberizeDistribution classes is that the user can quickly see that the distribution has been truncated/huberized, whether this is useful I am not entirely sure. As these would directly inherit from the relevant Distribution and copy relevant properties/traits, it wouldn't be too much extra work to create these if the trade-off (added information) is indeed helpful.

fkiraly commented 5 years ago

@RaphaelS1, yes, makes sense - the user, of course, wouldn't want to use the "generic wrapped" so you may have to decide whether it makes sense based on inheritance.

Regarding the type, I think it would make sense if you could query it and it tells you "this is a Huberized homogenous mixture of Gaussians", e.g., it returns upon a scientific type query something like Huberized(HomMix(Normal)) or similar

fkiraly commented 5 years ago

On a side note, it would be quite interesting if this can be easily translated to tidyverse pipe notation, e.g., Normal %>% HomMix(someparams) %>% Huberize(limits)

RaphaelS1 commented 5 years ago

On a side note, it would be quite interesting if this can be easily translated to tidyverse pipe notation, e.g., Normal %>% HomMix(someparams) %>% Huberize(limits)

Yes it can be. This is a case of a dependency on the package 'magrittr' and using R62S3. Then all R6 methods can be piped in this way (especially as R6 returns invisible(self) by default which makes method chaining very natural)

RaphaelS1 commented 5 years ago

distr example:

Norm() %>% Huberize(1,2) %>% mixDistr()
fkiraly commented 5 years ago

Ah, I see! So it was useful writing R62S3 after all...

RaphaelS1 commented 5 years ago

Well yes...and it will save hundreds of lines of code down the line when we want to make use of dispatch!!!

PeterRuckdeschel commented 5 years ago

t

@fkiraly the main advantage of the generic wrapped is as they do in mlr, to have a generic function that gives access to the original distribution. I agree most of the 'original' methods are altered once truncated/huberized but perhaps some of the original properties/traits are still informative? I think a generic wrapped only makes sense if we can think of wrappers that my be implemented in the future that would have benefitted from the original infrastructure, as building several distinct wrappers could quickly become complicated. And I believe (@PeterRuckdeschel or @stamats please correct me if I'm wrong) that in distr, mixtures are separated as you suggest.

@PeterRuckdeschel The only added benefit of the TruncateDistribution and HuberizeDistribution classes is that the user can quickly see that the distribution has been truncated/huberized, whether this is useful I am not entirely sure. As these would directly inherit from the relevant Distribution and copy relevant properties/traits, it wouldn't be too much extra work to create these if the trade-off (added information) is indeed helpful.

I agree; as this is not much burden to have this information, this is probably nice to have. The only con it I could think of is maintenance of this class in subsequent versions.

PeterRuckdeschel commented 5 years ago

On a side note, it would be quite interesting if this can be easily translated to tidyverse pipe notation, e.g., Normal %>% HomMix(someparams) %>% Huberize(limits)

Indeed. This would be cool.

PeterRuckdeschel commented 5 years ago

@RaphaelS1, yes, makes sense - the user, of course, wouldn't want to use the "generic wrapped" so you may have to decide whether it makes sense based on inheritance.

Regarding the type, I think it would make sense if you could query it and it tells you "this is a Huberized homogenous mixture of Gaussians", e.g., it returns upon a scientific type query something like Huberized(HomMix(Normal)) or similar

we never really have pursued this in all rigor, but two possibilies could be worth thinking of: one is a sort of history where we store the calls generating the objects, the second (implemented already) is a "simplify" operation which makes an object forget about its history and simply makes a discrete / abscont/ LebDec object out of it (for speed reasons for instance).

RaphaelS1 commented 5 years ago

I think we could combine both ideas by having the .withSimplify parameter set to FALSE by default and making it local not global (this may already be the case, can't remember off the top of my head).

Then a user can call a function like Huberize and simplify later if required e.g. via Huberize(Norm(), Lower, Upper) %>% Simplify()

Alternatively if a user knows they don't care about the original distribution then, Huberize(Norm(), Lower, Upper, Simplify = T)

Arguably any method that acts on this that would benefit from an increase in speed/efficiency, could first call Simplify() anyway.

fkiraly commented 5 years ago

Any sufficiently advanced "simplify" is indistinguishable from a compiler...

I wouldn't prioritize it for the initial release.

RaphaelS1 commented 5 years ago

I think the more general question is if the Huberize() wrapper should always construct an object of class HuberizedDistribution when it could be ContinuousDistribution, which is arguable the 'simpler' class. I think the former should be the case and then later down the line we can look at simplification (which I think is what you're suggesting)

PeterRuckdeschel commented 5 years ago

Well, at least whenever the huberization points become "active", i.e., if they lie within the support of the starting/input AbscontDistribution, the result of Huberize() will always have mass points at these points, so the result will no longer be an AbscontDistribution; of course in the border case, if huberization is not effective, i.e. the result is the original starting/input distribution, then one should return it unchanged.

RaphaelS1 commented 5 years ago

And as a more general point then, if someone were to try a method outside of a distribution's support (e.g. truncate/huberize with limits outside the support limit or similar) should we:

a) Return the original distribution with warning (e.g. "This method has no effect") b) Return the original distribution without warning c) Prohibit this from happening in the first place by using validation tests like "does limits lie in support" d) Return the truncated/huberized wrapper-class as requested but with the original distribution information

fkiraly commented 5 years ago

Regarding the class of huberized distribution: I think it should be "HuberizedDistribution", in line with how sklearn wrappers behave. By inheritance it would also be a descendant of the right kind of distribution, no?

Regarding the warning, I think it should raise an error if the type is incompatible - e.g., if you try to (univariate) huberize a discrete distribution taking values not in the reals, or a multivariate distribution.

Otherwise, i.m.o., it should do nothing - it is a general point of design to not show warning messages of the type "you are doing something stupid, dear user", since there are many many potential situations that would warrant a similar message.

RaphaelS1 commented 5 years ago

Okay that makes sense. So I believe we have discussed through this issue and in summary:

  1. truncate() and huberize() are constructors taking as arguments the original distribution and lower/upper limits, they have three possible outputs:
    1. A warning message if type is incompatible
    2. The original distirbution if they have no effect
    3. An instance of class TruncatedDistribution or HuberizedDistribution which are wrappers inheriting from one of discrete/continuous/mixed distribution.
  2. In construction of a truncated/huberized distribution the relevant methods will be edited as required and both of these classes will give access to the original distribution
  3. We will not have a generic WrappedDistribution as the only method we would add would be something like getOriginalDistribution but this is more work than adding this to the individual wrapped classes
  4. MixtureDistribution inherits directly from one of UniVarDistribution (etc.) class
  5. LebDecDistribution inherits directly from MixtureDistribution

The only thing we haven't fully discussed is conditional distributions. I think it may make sense to give these the same hierarchy structure as HuberizedDistribution/TruncatedDistribution, e.g.

Distribution <- UnivariateDistribution <- DiscreteDistribution <- ConditionalDistribution

If anyone disagrees with this setup for conditional distributions or wants to add anything to the above please let me know, if not I'll close this issue and add the above to design documentation

PeterRuckdeschel commented 5 years ago

Conditional distributions would be more complicated than this -- any univariate distribution class may arise as a conditional distribution (together with information on the actual condition); so the rightmost class in this chain would be ConditionalDiscreteDistribution (or / matter of taste DiscreteConditionalDistribution).

PeterRuckdeschel commented 5 years ago

Regarding the class of huberized distribution: I think it should be "HuberizedDistribution", in line with how sklearn wrappers behave. By inheritance it would also be a descendant of the right kind of distribution, no?

Regarding the warning, I think it should raise an error if the type is incompatible - e.g., if you try to (univariate) huberize a discrete distribution taking values not in the reals, or a multivariate distribution.

Otherwise, i.m.o., it should do nothing - it is a general point of design to not show warning messages of the type "you are doing something stupid, dear user", since there are many many potential situations that would warrant a similar message.

Adding to this: in a chain of automatic evaluations, it may not even be "stupid" at all to have some truncations/huberizations which become effective and others which don't. So just returning the unchanged imput without any warning is exactly what is expected. This is really essential when setting up arithmetics on distributions (as we do in distr), where you want to ensure closedness of the operations in the sense that the return value class is always covered by the set of implemented distribution classes.

RaphaelS1 commented 5 years ago

Conditional distributions would be more complicated than this -- any univariate distribution class may arise as a conditional distribution (together with information on the actual condition); so the rightmost class in this chain would be ConditionalDiscreteDistribution (or / matter of taste DiscreteConditionalDistribution).

Sorry this was just lazy writing of me, "DiscreteConditionalDistribution" is what I meant!

RaphaelS1 commented 5 years ago

See the UML diagram for full details of this implementation, if anyone has anything to add let me know if not I will close this issue