probml / pyprobml

Python code for "Probabilistic Machine learning" book by Kevin Murphy
MIT License
6.51k stars 1.53k forks source link

Convert mixPpcaDemoNetlab to python #230

Closed mjsML closed 3 years ago

mjsML commented 3 years ago

This is figure number 20.12 in the book. its matlab code is here

ashishpapanai commented 3 years ago

Hello @mjsML, I would like to work on this issue.

mjsML commented 3 years ago

Hello @mjsML, I would like to work on this issue.

Please go ahead, you can look at #99 for instructions. When you submit a pull request, kindly mention the issue number and ask for a review from @murphyk and myself. Thanks!

ashishpapanai commented 3 years ago

I have a few queries @mjsML,

if ~exist('gmm')
   error('must install netlab from http://www.ncrg.aston.ac.uk/netlab/index.php')

1. In these lines are we checking for a variable 'gmm' of a file. Because I tried to check for a local or a global variable with this name in python by running:

 if not ('gmm' in locals()):
        raise Exception("must install netlab from http://www.ncrg.aston.ac.uk/netlab/index.php")

I got the exception which I have raised in the second line, and the suggested website is down.

2. Is gmmint a specific function of MATLAB or is it a function defined in any other module in the scripts folder? mix = gmminit(mix, data, options);

3. Is gmem a specific function of MATLAB or is it a function defined in any other module in the scripts folder? [mix, options, errlog] = gmmem(mix, data, options);

mjsML commented 3 years ago

@murphyk also here 🙂

murphyk commented 3 years ago

All the netlab code is here. The command "if ~exist('gmm')..." just checks if the gmm() function is on Matlab's global path (since Matlab does not have proper package management system...)

But rather than trying to get the old matlab code running, you could just try to reproduce the figure itself (or something similar) using a python implementation of the algorithm. I don't think it's in sklearn, but a quick google search for "mixtures of PPCA" reveals these implementations (WTFPL license):

I don't know if either of those work. If not, you will have to implement it yourself - it's a good exercise (but challenging!). I briefly describe the method in the book, but the details are in Tipping's original paper.

If you want a clean implementation of 'vanilla' PPCA using EM, see this Python script by @gerdm . It should be fairly easy to modify this to the mixture case. (Replace numba.jit with jax.jit :)

If you want to be fancy, you could try adding your code to sklearn. Specifically, add a covariance_type='low-rank' option to the sklearn.mixture.gaussianMixture class. But that would be a lot of work....

(Just as an FYI, see also the Pomegranate library and the Pyro libary for related code.)