Open nrlucaroni opened 11 years ago
I'm misunderstanding why adding the following will not work...
module type MultivariateDistribution = sig
include BaseDistribution with type elt := float array
val dimension : t -> int
end
with error message,
Error: Only type constructors with identical parameters can be substituted.
but the following does,
module type MultivariateDistribution = sig
type vector = float array
include BaseDistribution with type elt := vector
val dimension : t -> int
end
Abstracting elt
type in Mean
and similar signatures sounds good. However, this won't be enough to support multivariate distributions.
I'm unsure on what's the best way to approach this, but the first thing that comes to mind isn't very elegant:
module type UnivariateDistribution = sig
type t
type elt = float
include BaseDistribution with type t := t and type elt := elt
end
module type MultivariateDistribution = sig
type t
type elt
include BaseDistribution with type t := t and type elt := elt
end
(* And, the boilerplate for discrete-continuous cases. *)
The reasons we currently have discrete continuous cases separated are:
We use labels to indicate the type of the argument for probability
and cumulative_probability
, so simply abstracting the type of the random variable won't work. Example:
Normal.(cumulative_probability ~x:0.42 standard)
Poisson.(cumulative_probability ~n:10 (create ~rate:.42))
As for the compiler error, I've never seen this one before, I think we should ask for clarifications in the mailing list.
Update: compiler error is documented here:
There are a number of restrictions: [...] the definition must be either another type constructor (with identical type parameters).
I've tried to generalize distribution signatures, so now each distribution also has an elt
type. However, I'm unsure what to do with remaining signatures. For instance, Mean
:
module type Mean = sig
type elt
type t
val mean : t -> elt
end
Most discrete distributions have real means, so we can't just include Mean with type elt := elt
and including Mean
with different types seems hackish to me. What do you think?
Yeah that's a tough one.
Actually, what do you think about switching to objects for distributions? that way can get rid of all of the micro-signatures, like Mean
, Variance
etc, because we have row polymorphism for objects:
type 'a mean = < mean : 'a; .. >
type 'a mean_opt = < mean_opt : 'a option; .. >
Okay, I've chosen to stick with modules for now, multivariate normal distribution can be expressed as:
module MultiNormal : sig
type elt = float array
include BaseDistribution with type elt := elt
include Features with type t := t and type elt := elt
include MLE with type t := t and type elt := elt
end
However, I'm unsure if we should focus on this now: neither SciPy
nor R
provide multivariate distributions out of the box. So maybe we should delay this until later?
I prefer modules too. I thought R/scipy provided a fairly full distribution suite, but I see that (looking at [1] and [2]) they have only a few basic ones as you've pointed out. I think at least allowing some generality to implement them is important along with a few basic ones.
[1] - http://docs.scipy.org/doc/numpy/reference/routines.random.html [2] - http://cran.r-project.org/web/views/Distributions.html
Well, for SciPy a list of supported distributions is a little longer, but still, all of them are univariate.
Is there a way to abstract the 'float' from the distribution modules to also include 'float array' (or other data-types) to fully extend the distribution modules? and is that sufficient to extend distributions to multivariate ones? Something like...