walkccc / CLRS

πŸ“š Solutions to Introduction to Algorithms Third Edition
https://walkccc.me/CLRS
MIT License
4.65k stars 1.26k forks source link

Exercise 5.4-2 #387

Open yjian012 opened 3 years ago

yjian012 commented 3 years ago

There are a few issues with the solution. First let's repeat the question and the current solution.

5.4-2 Suppose that we toss balls into b bins until some bin contains two balls. Each toss is independent, and each ball is equally likely to end up in any bin. What is the expected number of ball tosses?

This is just a restatement of the birthday problem. I consider this all that needs to be said on this subject.

First thing is, the question is about the expected value, not the probability. But we can leave this for later. The real question is, is this just "a restatement of the birthday problem"? To find the expected value, we need to find the probability that "The moment after we threw the _k_th ball, some bin contains two balls". On the other hand, the birthday problem in this context is about the probability that "After we threw k balls, some bin contains two or more balls". So apparently the first probability is strictly less than the second one for any k>=3, because the events in the first case is a proper subset of the events in the second case.

So first, the first k-1 balls must be in different bins, and the k th must be in one of them. The expected value is $\sum{k=2}^{n+1}k\frac{P(n, k-1)}{n^{k-1}}*\frac{k-1}{n}$, which equals $\sum{k=2}^{n+1}k(k-1)\frac{n!}{(n-(k-1))!n^k}$, or rather $\sum_{k=1}^n k(k+1)\frac{n!}{(n-k)!n^{k+1}}$ if we replace $k$ with $k+1$. Here $P(n,k)$ is permutation.

Another way to think about it is, the first probability $Pr(k)$ can be written as $Pr(k)=Pr'(k)-Pr'(k-1)$, where $Pr'(k)$ is the second probability, which can be written as $Pr'(k)=1-\frac{P(n,k)}{n^k}$. Either way, we end up with the same expression.

Where do we go from here? I'm not quite sure. We have a summation of this form $\frac{1}{n}\sum_{k=1}^n k(k+1)\left((1-\frac{1}{n})\dots(1-\frac{k-1}{n})\right)$. I used the exponential approximation and then approximate it with integration, and I got the leading term ~$\sqrt{\frac{\pi n}{2}}$.

I found another solution here saying that

The expected number of ball tosses is 1 + \sum_{k = 1}^{b} \frac{b!}{(b - k)!b^k}, it's a birthday problem.

Well, he said the same thing, but still, it's not the same "birthday problem" as the one given in this book... Anyways, at least there is a reference. (BTW, the numeric solution to 5.4-1 there is correct, instead of the one given here, as this issue pointed out.) I'm not sure where this equation comes from so I followed the link, but I still don't know how to get this equation. I checked the reference "The Art of Computer Programming. Vol. 3, section 6.4 Hashing", but I didn't find this equation either. Apparently the function Q(M) also has leading term $\sqrt{\frac{\pi M}{2}}$, but right now I don't know if my result is equivalent to the one given on the Wikipedia page, 1+Q(M), and if so how to prove it.

Anyways, I don't think this is "just a restatement of the birthday problem". And if someone knows how to derive the 1+Q(M) solution, please let me know. (I proved the result above equals 1+Q(M), see the comment below.)

yjian012 commented 3 years ago

OK, I did the proof myself. Basically we want to prove $\sum{k=0}^n\frac{k(k+1)}{n^k(n-k)!}=\sum{k=0}^n\frac{n}{n^k(n-k)!}$. The proof goes as $\sum{k=0}^n\frac{k}{n^k(n-k)!}\ =\sum{k=1}^n\frac{k}{n^k(n-k)!}\ =\sum{k=0}^{n-1}\frac{k+1}{n^{k+1}(n-(k+1))!}\ =\frac{1}{n}\sum{k=0}^{n-1}\frac{(k+1)(n-k)}{n^k(n-k)!}\ =\frac{1}{n}\sum{k=0}^n\frac{(k+1)(n-k)}{n^k(n-k)!}\ =\frac{1}{n}\sum{k=0}^n\frac{n(k+1)-k(k+1)}{n^k(n-k)!}\ =\sum{k=0}^n\frac{k}{n^k(n-k)!}+\sum{k=0}^n\frac{1}{n^k(n-k)!}-\frac{1}{n}\sum{k=0}^n\frac{k(k+1)}{n^k(n-k)!}$ by eliminating the same term on both sides, we get the desired result. β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€” AHhhh. How did I not see that. I already noticed that $Pr(k)=Pr'(k)-Pr'(k-1)$. So the expected value is just $\sum kPr(k)=\sum kPr'(k)-kPr'(k-1)$ and they will cancel out... $\sum{k=2}^{n+1} kPr(k)\ =\sum{k=2}^{n+1} kPr'(k)-kPr'(k-1)\ =\sum{k=2}^{n+1} kPr'(k)-kPr'(k-1)\ =\sum{k=2}^{n+1} k(1-\frac{P(n,k)}{n^k})-k(1-\frac{P(n,k-1)}{n^{k-1}})\ =\sum{k=2}^{n+1} -k\frac{P(n,k)}{n^k}+k\frac{P(n,k-1)}{n^{k-1}}\ =\sum_{k=1}^n \frac{P(n,k)}{n^k}+\frac{P(n,1)}{n^1}\ =1+Q(n,k)$ Now I get it...