mattjj / pybasicbayes

MIT License
153 stars 60 forks source link

EM_demo question #28

Closed jmgo closed 8 years ago

jmgo commented 8 years ago

Hi!

I was looking into your EM_demo file and realized that when use a Mixture (model) the method BIC returns a number, but when I do the same for MixtureDistribution I get a list with floats. Shouldn't the Distribution also give a number (since it inherits the method from Mixture)?

Regards, Jorge

mattjj commented 8 years ago

Hey Jorge,

The implementation reason it's behaving differently is that the BIC method calls self.log_likelihood(data), which for Models would return a single number but for Distributions returns an array with the kth entry set to the log likelihood of data[k]. So even though Models and Distributions each have a log_likelihood method, those methods have different semantics, and those semantics are getting confused when BIC calls self.log_likelihood.

However, I don't think it makes sense for MixtureDistribution to inherit BIC at all. When would we want to call BIC on a Distribution? I think it may only make sense for Models, and the fact that MixtureDistribution inherits it from Mixture at the moment is just a bit messy.

What do you think? Is there a reason you want to call BIC on a Distribution?

jmgo commented 8 years ago

Ok, thanks for the reply.

At the beginning i though models and distributions were same thing, but know I see that they have different behavior. I got to say I not very much clear on the differences. Nonetheless, what I want to do is to use a mixture of Gaussians and use BIC to find the number of components, then the best model will then be used in a HMM.

For that I can just use the Mixture (model) to find the best model and then create a MixtureDistribution (with the params of the best model) for the HMM, right?

Regards, Jorge

mattjj commented 8 years ago

Yeah that would work. There are other ways, too. This comment might help with the difference between distributions and models.

jmgo commented 8 years ago

Ok. Thank you! I think now I understand the main difference between model and distribution.

Congrats for the work! I gotta say that if this package had a good documentation it could be the nº 1 go-to package for H(S)MMs in python.

Regards, Jorge

mattjj commented 8 years ago

Thanks, glad you like it!