jmoenig / Snap

a visual programming language inspired by Scratch
http://snap.berkeley.edu
GNU Affero General Public License v3.0
1.51k stars 744 forks source link

Streams library "sieve" block producing primes is NOT a Sieve of Eratosthenes... #3412

Open GordonBGood opened 1 day ago

GordonBGood commented 1 day ago

In the help notes for this "sieve" block included in the demos for the Streams library, you say "It's called SIEVE because the algorithm it uses is the Sieve of Eratosthenes: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes ", yet when one reads the linked Wikipedia article (at least as updated over the last 15 years or so), one finds the following statement: "The widely known 1975 functional sieve code by David Turner[13] is often presented as an example of the sieve of Eratosthenes[7] but is actually a sub-optimal trial division sieve.[2]".

That David Turner sieve is exactly what you have implemented, as expressed in Haskell as follows:

primes = sieve [2..]
sieve (p : xs) = p : sieve [x | x <− xs, x `mod` p > 0]

or more as you have written it in Snap! (using Haskell's "filter" instead of the equivalent Snap! "keep"):

primes = sieve [2..]
sieve (p : xs) = p : sieve (filter (\ c -> c `mod` p > 0) xs)

This has much worse performance than a real functional "incremental" Sieve of Eratosthenes (SoE) so as to be almost unusable for your Snap! version, taking over 10 seconds on a fairly high end desktop computer to find the primes just to a thousand; it is slow because rather than using SoE sieving by only incrementing the culled composites per base prime by addition of a constant span, it uses trial division (via the "mod" function) to test ALL remaining prime candidates for even division by all found primes - thus, it has an exponential computational complexity rather than the O(n log n (log log n)) computational complexity for the true incremental SoE.

Using the work from the Wikipedia article referenced article, one could write a true incremental SoE using a Priority Queue as O'Neill preferred but that would require the extra work of implementing the Priority Queue using either a binary tree structure or using Snap!'s List's in imperative mode (not functional linked-list mode). In the Epilogue of the article, reference is made to a "Richard Bird list-based version" of the SoE, which is much easier to implement. However, at that time, this sieve still hadn't been optimized to use "infinite tree folding" to reduce the complexity of the base primes composites merging to a computational complexity of about O(n log n) rather than O(n sqrt n) without the tree folding, which is a very significant difference as prime ranges get larger. A refined version of this sieve (sieves odds-only) in Haskell is as follows:

primes = 2 : oddprimes() where
  merge xs@(x : xtl) ys@(y : ytl) =
    case compare x y of
      EQ -> x : merge xtl ytl
      GT ->  y : merge xs ytl
      LT ->  x : merge xtl ys
  composites ((hd : tl) : rest) = hd : merge tl (composites $ pairs rest) where
    pairs (f : s : rest) = merge f s : pairs rest
  testprmfrm n cs@(c : rest) =
    if n >= c then testprmfrm (n + 2) rest else n : testprmfrm (n + 2) cs
  oddprimes() = (3 :) $ testprmfrm 5 $ composites $ map (\ bp -> [bp * bp, bp * bp + bp + bp .. ]) $ oddprimes()

In contrast to the David Turner sieve, instead of progressively sieving composite numbers from the input list by filtering the entire remaining list, this tree folding algorithm rather builds a merged (and therefore ascending ordered) lazy sequence of all of the composite values based on the recursively determined "oddprimes" function (non-sharing) and then the final output sequence of prime numbers is all of the (in this case odd) numbers that aren't in the composites sequence. It is relatively easy to arrange the merged sequence of composites in ascending order because each of the sub lists of culls from each of the base primes is in order but also because each new sub sequence based on the next base prime will start at a new increased value from the last (starting at the square of the base prime). For this fully functional "incremental" algorithm, there is a cost in merging for all of the sub sequences, but the binary tree depth is limited: for instance, when finding the primes up to a million, the base primes sequence only goes up to the square root or a thousand, and there are only 167 odd primes up to a thousand, meaning that the binary tree only has a depth of about eight (two to the power of eight is 256) in the worst case (some merges won't need to traverse to the bottom of the tree)

Now, as all of these sequence states don't refer back to previous states, there is no need to memoize the states as in a "lazy list" or as in your "streams" library, and a simpler Co-Inductive Stream (CIS) can be used, as follows in Haskell:

data CIS a = CIS !a !(() -> CIS a)
primes = CIS 2 $! oddprimes where
  merge xs@(CIS x xtl) ys@(CIS y  ytl) =
    case compare x y of
      EQ -> CIS x $! \ () -> merge (xtl()) (ytl())
      GT ->  CIS y $! \ () -> merge xs (ytl())
      LT ->  CIS x $! \ () -> merge (xtl()) ys
  composites (CIS (CIS hd  tl) rest) = CIS hd $! \ () -> merge (tl()) (composites $! pairs (rest())) where
    pairs (CIS f ftl) = let (CIS s  rest) = ftl() in CIS (merge f s) $! \ () -> pairs (rest())
  allmults (CIS bp bptl) = CIS (cullfrmstp (bp * bp) (bp + bp)) $! \ () -> allmults (bptl()) where
    cullfrmstp c adv = CIS c $! \ () -> cullfrmstp (c + adv) adv
  testprmfrm n cs@(CIS c rest) =
    if n >= c then testprmfrm (n + 2) (rest())
    else CIS n $! \ () -> testprmfrm (n + 2) cs
  oddprimes() = CIS 3 $! \ () -> testprmfrm 5 $! composites $! allmults $! oddprimes()

The above Haskell code is written so as to better understand as this "CIS" implementation is written in non-lazy languages as it doesn't take advantage of Haskell's built-in laziness and bypasses the compiler "strictness analyzer" so may be less efficient that the lazy list version above in spite of not doing the extra work to do memoization of each sequence value...

The above has been translated to Snap! as per this attached file and can be tested with "listncis 168 primes" to show the list of the primes up to a thousand in about a second, and works in a little over ten seconds showing primes up to about ten thousand; it thus has more of a linear execution time with increasing ranges...

I recommend you fix your Streams demo to use an implementation of this algorithm, although as noted above, the algorithm doesn't really required the overhead of the Streams memoization...

brianharvey commented 1 day ago

I'm pretty sure Eratosthenes didn't have a computer, and the odds are he did have slaves to do the actual crossing out of numbers, so he was less interested in algorithmic efficiency than you are.
And if you think about what a real-life sieve is, it works by filtering. I'll reread the Wikipedia article, though.

But in any case, if I wanted a finite number of primes, I'd do it iteratively like everyone else, not with streams. The whole point of the demo is that you can get all the primes if you have lazy lists. It would defeat the (pedagogic) object of the exercise to use a more complicated algorithm that hides the beauty of this one.

I do wish the streams library implemented lazy lists more efficiently. If you have any words of wisdom about that, it'd be great!

GordonBGood commented 23 hours ago

I'm pretty sure Eratosthenes didn't have a computer, and the odds are he did have slaves to do the actual crossing out of numbers, so he was less interested in algorithmic efficiency than you are. And if you think about what a real-life sieve is, it works by filtering. I'll reread the Wikipedia article, though.

Ah, but Eratosthenes was interested in algorithmic efficiency, which is why he came up with his Sieve algorithm that didn't require dividing each and every current prime candidate by each and every of the found base primes (which is what the David Turner and your Snap! version does, and just instructed his slaves to cull by advancing the culling by a constant span from the square of the base prime for each base prime up to the square root of the current range. His sieve didn't work by filtering, but yes, the usual implementation does call all the composites up to a range and then processes the remaining primes, but there is no reason that it can't be done incrementally, which is what functional incremental versions do. Given that Eratosthenes had one slave per base prime (increasing his slaves as the sieved range increased), his algorithm upon finding a new base prime would be to assign the square of that base prime and an increment of that base prime to the slave, tell each to advance by the increment to just higher than the current candidate, which would be prime if less than the current cull value held by all slaves, where one new slave would be added when the last added slave starts to need to advance. In that way, each slave only has to remember two numbers, their current culling place and how much to advance when their current place is passed. On top of that, the infinite tree folding algorithm is a way of merging which composite number is next for consideration by organizing the slaves in pairs, with each result forwarded upward to pairs of pairs, to pairs of pairs of pairs, etc.

This algorithm does kind of work by filtering in that the "testPrimeFrom" block compares all prime candidates in turn (only odd in this implementation) and keeps only the ones that aren't in the stream of composite culls. This is exactly how an efficient incremental version of the SoE works...

But in any case, if I wanted a finite number of primes, I'd do it iteratively like everyone else, not with streams. The whole point of the demo is that you can get all the primes if you have lazy lists. It would defeat the (pedagogic) object of the exercise to use a more complicated algorithm that hides the beauty of this one.

Your David Turner algorithm is beautiful only if you consider short as beautiful with no consideration of the computational complexity. In terms of computational complexity, it is extremely ugly with its exponential computational complexity, and the incremental SoE doesn't need lazy lists, as non-memoizing Co-Inductive Streams are all that are necessary, which are much easier (and faster by a constant factor) to implement. Even when compiled on very fast computers, the David Turner sieve is never practical for finding primes of much more than say a hundred thousand, which is a trivial range, and functional algorithms such as this can't be multi-threaded as each new prime depends on the state of previous computations...

All of the primes of an infinite stream is not really a consideration in that, even if possible with infinite memory, one isn't going to use them for anything; one usually just wants them to be able to count them, find and count patterns such as pairs of "K-tuples", working with slices of them, etc.; most prime processing processes streams of primes while letting the beginning of the stream evaporate when it is no longer necessary for say the pattern being looked for...

Also, I think you missed the point of my alternate implementation: it also produces an infinite stream of primes as it advances, just that it doesn't memoize any primes other than the base primes captured by the closures used in merging the individual base prime culling streams. I converted it to a finite sized list only in order to be able to easily display the results.

I do wish the streams library implemented lazy lists more efficiently. If you have any words of wisdom about that, it'd be great!

There is a way of implementing laziness only using closures which I didn't present here as memoization isn't necessary for the tree-folding-primes algorithm (also not using memoization makes it faster by a constant factor). However, there is an implementation for a "lazy" type just by using functional closures to capture the internal states of whether a computation has been done yet or not, the value when computed, and the "thunk" to be executed once in order to obtain that value. The following are images of block definitions to do that, which may be faster than the list manipulations you are doing now: MakeLazy GetLazyValue

Once one has defined the lazy type handling, one can combine it into my current CIS non-memoizing stream in order to get a memoizing lazy list type, as in the following images: MakeLazyList HeadLazyList TailLazyList

It might be slightly faster to inline the make/get lazy blocks into the making of the lazy list block and the obtaining of the tail block, here separated for clarity...

I haven't compared the speed of using a memoized lazy list stream as compared to my non-memoizing CIS stream for the tree folding incremental primes algorithm, which would be an interesting test to see how much memoization costs...

In algorithms that actually need memoization such as a Hamming/Smooth Numbers infinite stream, this may make the performance a little better, although it will never be that fast or powerful as these implementations in Snap! are limited to interpreting scripts in JavaScript...

GordonBGood commented 21 hours ago

@brianharvey,

I did some timing and the difference in performance between the current Stream implementation and my new one where the states are buried inside closures isn't all that much, where it takes something about 12 seconds to list a stream of 1000 numbers; this may be due to JavaScript forming closures being particularly slow, in particular in Google Chrome and Firefox and even twice as slow with Safari, which will make either technique slow as both depend on "thunk" closures...

Using my primes algorithm, one will still be able to find the primes up to ten thousand in perhaps about this amount of time as for for when using non-memoizations as forming closures is likely the bottleneck. For the David Turner algorithm, your VM will time out finding primes to this range as it would take hundreds of seconds to complete...