weights in multinomial - Githubissues

nilsbecker commented 8 years ago

a nitpick: the weights vector is summed up and tested for summing to 1. if you're gonna sum it up anyway, why not allow arbitrary positive weights and normalize the weight vector?

oml/src/lib/stats/sampling.ml line 64

rleonid commented 8 years ago

I think that you're raising a good point, but in this case the weight array represents the probabilities of a multinomial distribution, so I think the correct 'contract' for this method to enforce would be to check that the weights are between 0 and 1 (and sum to 1). For comparison softmax allows arbitrary weights because the method is one way of transforming an arbitrary set of values into a probability distribution.

I've been thinking a lot recently about making sure that the type interface is as accurate as possible to the underlying mathematical representation (without going "full" Haskell on the library). How can we make this clearer?

Could we define a probability type so that multinomial becomes val multinomial : ?seed:int array -> probability array -> int generator
Maybe renaming "weights" to "probabilities" in the documentation?

nilsbecker commented 8 years ago

hmm, i do see your point of enforcing the input to be a type that it naturally should be. i was more thinking about efficiency. now, the client has to generate the weights, and presumably, has to normalize them to 1 to comply with the interface. then the function does the same computation again. waste of effort.

i guess one could invent a weightvector type which when constructed, normalizes itself. sounds reasonable. and it would obviate the need to check again in the function. but this could not be a 'a array since there would be no way to enforce the normalization (right?). at the same time one would probably also want to do some arithmetic with such a normalized weight vector. for that maybe an array_of_weightvector would be enough. or a private type? (i don't have experience with those)

rleonid commented 8 years ago

Those are good points, let me ruminate on a solution. I think that a private float array might be what we're looking for.

rleonid commented 8 years ago

@nilsbecker Check out the prob_module branch:

Specifically there is now a method Probability.normalize to wrap the weights, so that multinomial now takes a Probability.s.

There is probably a bit more work of adding methods to the main Probability module

all the uses in Classification should go here
convenience functions like map2 and fold2 for using these weights. but in general, I think a private float array should work here and allow users to want more flexibility in how they define these weights to access them via casting.

@struktured Any thoughts?

struktured commented 8 years ago

I have thoughts but only somewhat cohesive. Maybe you'll find some of it useful. I like the idea of a probability module. I've implemented a few over the years in other languages, although I'd hesitate to say I've ever seen a perfect version of one in code yet. My favorite representation is typically

polymorphic
immutable
sorted by likelihood
preserves a scale value

I understand why making it polymorphic could be troublesome for consuming apis but I really do enjoy a strong coupling between datatypes and probabilities. It's also very human readable. I rather see a Coin.t Probability.t opposed to some opaque indexed structure.

Sorted by likelihood is probably too expensive for large arrays, but maybe a convenience function to copy / transform the existing weight vector into a sorted structure when the user needs to would be helpful.

Also another thing to consider I as write this is that what you defined is a discrete multinomial probability vector, but a probability itself could be some sort of density function, if we want to pedantic about it. Maybe probability should be an abstract sig for which multinomial is one such implementation?

Preserving the scale value means keeping the weights always normalized, but maintaining one extra constant to rescale the weight vector into frequency counts. The pro of this is the vector is always normalized and thus cheap to query. The con is updating the distribution is more expensive (eg. rescale to frequencies, add the new frequency counts, unscale) and capable of numerical instability. And do we actually want to preserve the scale at all in a probability? Personally I find it more convenient to have some way to update the distribution when a new observation is given, and you must have it unnormalized or the scale factor must be preserved for this to possible, although the information doesn't have to be bound to the weight vector.

And this leads to final thought on this- should we have different "views" of a distribution, with types to indicate this? You could think of one representation leading itself well to mutation- imagine a distribution being repeatedly updated every time some event is fired, but that you want to take a snapshot of the distribution at some time slice and normalize it. Eg. you query a "distribution" for a (normalized?) "probability"- the distribution is the unnormalized entity and the probability is a normalized view.

Regarding map2/fold2, I also really think filter and filter_map are quite useful for probability vectors (in the normalized version filter should maintain a sum of 1.0 of course...)

Side note: @rleonid Great talk at compose on thursday!

rleonid commented 8 years ago

We might be talking about 2 different modules. I am proposing that

Probability have two types t (for one prob) and an s (for many).
a Distribution would encapsulate probability distributions and probably return the appropriate types from the Probability module.

Some of the issues that this raises:

It makes the signatures for some of the methods obvious.
It allows us to have one place to contain the float -> Probability.t transformation. Leading to hopefully (a) more efficient code and (b) more correct code.
It forces Oml user to use provided functions or
Access the underlying float or float array via casting from private types.
It will have knock-on effects in things like Distribution or the Probability logic found in Classification.

struktured commented 8 years ago

So one of my points was do you want probability to be (scaleValue * float array) where sum(float array)=1, or an unnormalized float array? Or just a normalized float array? What is the value of having it unnormalized by default? What types of algorithms will benefit from this? In order to sample, you would at least need a scale factor to avoid doing a full pass on it.

As I noted before, one advantage to unnormalized is its easier to update the distribution when getting new observations. But if that's not the case here, it makes for a stronger case to normalize it.

rleonid commented 8 years ago

I'm imagining just a normalized float array. By default we should not provide a mechanisms to update this distribution (so map or filter wouldn't make sense as they might break the normalization) . We would keep the type private so if someone really wanted to, they could change the values back to a regular float array with casting. Scanning through the code I think we change the signature for

Sampling.multinomail (already present in the prob_module branch).
Distributions.dirichlet_pdf
Distributions.ln_dirichlet_pdf

We would have an analogous type Probability.t, all of the _cdf functions in Distributions.t would return that.

Lastly, we should add a "non-parametric" distribution data structure to accommodate polymorphic "pdf"'s.

rleonid / oml

weights in multinomial #142