vbartle / MML-Companion

This is a companion to the ‘Mathematical Foundations’ section of the book, Mathematics for Machine Learning by Marc Deisenroth, Aldo Faisal and Cheng Ong, written in python for Jupyter Notebook.
https://mml-book.github.io/external.html
256 stars 93 forks source link

6.4 Summary Statistics and Independence -> Example 6.4 could be wrong. #1

Open lthiet opened 4 years ago

lthiet commented 4 years ago

Hello.

I noticed something that could be wrong in your notebook where you replicate the example 6.4.

I'm not quite sure what you did here :

first = np.round(np.random.multivariate_normal(mean1, cov1, int(n/4))*.4,3) # n/4 to adjust distribution to book figure for countour plot.
second = np.round(np.random.multivariate_normal(mean2, cov2, n)*.6,3)
data = np.vstack([first,second])

but this isn't a mixture of gaussian distribution that matches the book description. The coefficients should not be applied on the random variable itself but it's pdf!

Furthermore, according to the book, the mean/expected value of a gaussian mixture is given by :

E(x) = alpha1 mu1 + alpha2 mu2

One plugs the corresponding means into the equation and should find that the (analytical) mean is :

E(x) = 0.4 [10,2] + 0.6 [0,0] = [4,0.8]

Checked against the plot in your notebook, this doesn't match.

Screenshot 2020-08-09 at 14 29 26

It looks E(x) is around [0.7,0.1].

Finally, the actual distribution pdf you describe is : p(x) = .2 N1 + .8 N2 where N1 is a random variable where a transformation f(x) = 0.4 x is applied for N2 it is g(x) = 0.6 x The mean of N1 is given by 0.4 [10,2] = [4,0.8] The mean of N2 is given by 0.6 [0,0] = [0,0] The mean of your actual distribution is given by 0.2 mean_of_N1 + 0.8 mean_of_N2 = 0.2 * [4,0.8] = [0.8,0.16] Which is rather close to what's on your notebook! The 0.2 and 0.8 coefficient are found from your notebook. n = 3000, there are n/4 = 750 samples for N1, and n=3000 samples for N2. 750 / 3750 = 0.2 and 3000 / 3750 = 0.8

Here is the simple change I propose :

first = np.round(np.random.multivariate_normal(mean1, cov1, int(n*0.4)),3)
second = np.round(np.random.multivariate_normal(mean2, cov2, int(n*0.6)),3)

Instead of applying the coefficients on the random variables, we apply the coefficient on their sample size. It should be analogous to applying the coefficients to their respective pdf.

With those changes, we get this new plot :

Screenshot 2020-08-09 at 14 31 47

There might be some work needed for the contour lines which I am not familiar with, but now the empirical mean checks with the analytical one!

I could be wrong since I've only carefully read this particular section of the notebook, and am open to any discussion regarding this matter.

Best, Lam

vbartle commented 4 years ago

Ah this makes a lot of sense, thank you. Do you have an intuition for why applying the coefficient to the sample size is analogous to applying it to the pdf? Is this because the pdf is essentially a representation of the sample sizes?

The contours are based on the seaborn kernel density estimation function which takes a bin-width parameter (bw), it was 1, and in addition to adding this change, changing it to .5 gives a better contour visualization.

Also, I'm not sure where you are seeing: "Finally, the actual distribution pdf you describe is : p(x) = .2 N1 + .8 N2" ? Can you reference a section in the book or notebook for me to take a look at?

Thank you again, looking forward to pushing these changes.

Screen Shot 2020-08-19 at 7 03 53 AM

lthiet commented 4 years ago

Sorry for taking so long to reply. Do you still needs answers?

vbartle commented 4 years ago

Yes please :) looking forward to adding these changes, would like to add clear explanations with them.

lthiet commented 4 years ago

Part 1

Okay, so I'm not an expert on probabilities and/or statistics, so I'll go with my intuition instead of a formal definition or proof. Let's say we have

p(x) = a p_1(x) + b p_2(x)

Where p(x) is a PDF of a GM. p_1(x) and p_2(x) are PDFs of gaussians. a and b are called mixture weight. I think that it is these weights that are representative of the sample size.

To see intuitively why this holds, let's say we have a bunch of samples generated from two gaussians, namely N1 and N2. We know that there is K1 samples from N1 and K2 samples from N2. There is K = K1 + K2 samples in total. Let's say that now we consider the samples as a whole, in other words, we pretend they have been drawn from a single distribution, for which we want to know its PDF.

What we know :

(Now this is the part where I don't really know how to explain in deeper detail :sweat_smile:, it just makes sense to me but I can't put it in better words. At least the expectations from my initial post checks, so it's a start). We look at one sample, called x, if we know that there is a probability of a that x was drawn from N1, then we can say with a probability of a that x follows the PDF of N1. Similarly for N2.

Maybe if we look into an extreme, it can make sense.

For K1 != 0 and K2 = 0, i.e. every samples were drawn from N1, we have a = K1/K = K1/K1 = 1, b = K2/K = 0. If we look at the PDF of this GM, we get p(x) = a p_1(x) + b p_2(x) = p_1(x), meaning the GM is essential just N1. It is true because we've drawn samples only from N1.

I'm aware this is not the best explanation and is rather loose. I would gladly contribute to a more formal explanation, although I fear we might be bottlenecked in a few parts because it gets quite similar to trying to prove 1+1=2 :sweat_smile:

Part 2

When I say that the pdf you describe is : p(x) = .2 N1 + .8 N2, this is because of

first = np.round(np.random.multivariate_normal(mean1, cov1, int(n/4))*.4,3) # n/4 to adjust distribution to book figure for countour plot.
second = np.round(np.random.multivariate_normal(mean2, cov2, n)*.6,3)

I assume that n is the sample size? So we draw n/4 samples from the first gaussian, n samples from the second, there would be n/4 + n = 5n/4 samples in total. Using the explanation from above, this means that a = (n/4) / (5n/4) = n / 5n = 0.2 and b = n / (5n/4) = 4n / 5n = 0.8.

Part 3

I hope this makes more sense, please do not hesitate to further discuss this. As you probably noticed, I would also like to manage to build up to a more formal explanation :)