reneargento / algorithms-sedgewick-wayne

Solutions to all the exercises of the Algorithms book by Robert Sedgewick and Kevin Wayne
MIT License
2.26k stars 664 forks source link

exercise 3.1.15 has incorrect answer #222

Closed faame closed 2 years ago

faame commented 2 years ago

In 3.1.15

3.1.15
     Searches Percentage of total time spent on insertions
                 1000                                        0.00%
          1000000                                        2.20%
   1000000000                                        5.92%

It shows then N=10^9, the insertion time is 5.92% of the overall time.

This is incorrect. It should be close to 100%

we can use the existing conclusion that for BinarySearchST Assuming insert inserts distinct keys. There are M insertions, for M distinct keys

average insertion cost is N
----> total insertion cost is N*N = N^2

M Search is conducted on N keys, thus

----> total search cost is M*lgN

Per the question, M = N*1000

So

----> total cost = N^2 + M*lgN = N^2 + 1000N*lgN

The insertion percentage is

P=(N^2) / (N^2 + 1000N*lgN)

I think the idea is that the N^2 is the high-order item here. Thus when N become asymptotic, P should approach 100%.

In the example, largest M=10^9, so largest N=10^6

In that case the percentage should be

P  = (10^12) / ((10^12) + 1000*10^6*lg(10^6))
  ~= (10^12) / ((10^12)) + (10^9) * 20 
  ~= (10^12) / ((10^12)) 
     =100%

Yet your program experiment is 5.92% far from 100% I think one issue is here:

In 3.1.15 source code

            for(int insertion = 0; insertion < numberOfInsertions; insertion++) {
                int randomValueToInsert = StdRandom.uniform(100000);
                binarySearchSymbolTable.put(randomValueToInsert, "Value " + randomValueToInsert);
            }

when number of insertions is N=10^6, you are drawing random number from [0, 10^5]. Thus you will have significant number of duplicate numbers, resulting a much shorter symbol table (I..e effective N << 10^6)

I printed out the size of the symbol table after N-10^6 insertions, and the size is less than 10^4, which is far from the expected 10^6 size.

So I think the correct anwswer for 3.1.15 is to use formulas

P=(N^2) / (N^2 + 1000N*lgN)

Using N=10^9 as upper bound of random number generator

        int randomValueToSearch = StdRandom.uniform(1000000);  // search on the entire range of distinct keys
        int randomValueToInsert = StdRandom.uniform(1000000000); //use large number to avoid duplicates

I can see the program getting closer to 100%

     Searches Percentage of total time spent on insertions
    100000000                                       86.00%
reneargento commented 2 years ago

Good analysis. I re-ran the experiments with that update and M = 10^9 had a 96.26% insertion percentage for me. I updated the exercise here: https://github.com/reneargento/algorithms-sedgewick-wayne/commit/64793c56c70c7857db6ad596c7db9fe9667e3089 Thanks!