source-academy / sicp

XML sources of SICP and SICP JS, and support for generating Interactive SICP JS, PDF, e-book and comparison editions
https://sourceacademy.org/sicpjs
Creative Commons Attribution Share Alike 4.0 International
915 stars 122 forks source link

Book error: Exercise 3.57 is incorrect, memoization of add_streams is insufficient #1063

Open PossibilityZero opened 1 day ago

PossibilityZero commented 1 day ago

I believe I have found an error in the book.

The Error (Exercise 3.57)

Exercise 3.57 is defined as:

How many additions are performed when we compute the _n_th Fibonacci number using the declaration of fibs based on the add_streams function? Show that this number is exponentially greater than the number of additions performed if add_streams had used the function stream_map_2_optimized described in exercise 3.50.

This implies that the following would only require a linear number of additions:

function stream_map_2_optimized(f, s1, s2) {
    return is_null(s1) || is_null(s2)
        ? null
        : pair(f(head(s1), head(s2)),
                memo(() => stream_map_2_optimized(f, stream_tail(s1), stream_tail(s2))));
}

function add_streams(s1, s2) {
    return stream_map_2_optimized((x1, x2) => x1 + x2, s1, s2);
}

const fibs = pair(
    0, 
    () => pair(
        1,
        () => add_streams(stream_tail(fibs), fibs)));

However, when I actually tried it I found that the number of addition operations is actually identical. In fact, the memoized functions are never called. There's a whole bunch of state and stored functions that make reasoning about it extremely convoluted even for small indices, but the root cause is that in the definition of fibs, we are not fully memoizing the stream tail functions. The following fix correctly runs with a linear number of additions:

const fibs = pair(
    0, 
    memo(
        () => pair(
            1,
            memo(() => add_streams(stream_tail(fibs), fibs)))));

This correctly memoizes all stream_tail functions.


Comparison to 1985 original

I checked the original Scheme version, which does not have this defect. Comparing to the original Scheme version, Chapters 3.5.1 and 3.5.2 are some of the most heavily edited: https://sicp.sourceacademy.org/chapters/3.5.1.html, and in particular the JS version gets rid of cons-stream, instead usiong raw pair operations for all stream construction.

The Scheme version of exercise 3.5.7 is:

How many additions are performed when we compute the _n_th Fibonacci number using the definition of fibs based on the add-streams procedure? Show that the number of additions would be exponentially greater if we had implemented (delay <exp>) simply as (lambda () <exp>), without using the optimization provided by the memo-proc procedure described in section 3.5.1

The question is in fact sort of reversed; the implementation of fibs is optimal, and the exercise suggests thinking of a potential unoptimized version – namely, not using memoization. This works because the memoization is baked into the cons-stream procedure, which becomes the primitive constructor for all streams.

Given the functions already available, a JS implementation would be something like this:

function cons_stream(sHead, sTailFunc) {
    // memo function provided in Chapter 3.5.1
    return pair(sHead, memo(sTailFunc));
}

By using this constructor everywhere that we make streams with pair currently, the optimization becomes "baked in"; in fact, stream_map_2 can automatically take advantage of it without explicitly calling memo:

function stream_map_2(f, s1, s2) {
    return is_null(s1) || is_null(s2)
      ? null
      : cons_stream(
            f(head(s1), head(s2)),
            () => stream_map_2(f, stream_tail(s1), stream_tail(s2)));
}

function add_streams(s1, s2) {
    return stream_map_2((x1, x2) => x1 + x2, s1, s2);
}

const fibs = cons_stream(0,
        () => cons_stream(1, () => add_streams(stream_tail(fibs), fibs)));

The definition of fibs here is pretty much identical to the one given for Scheme:

(define fibs
  (cons-stream 0
               (cons-stream 1
                            (add-streams (stream-cdr fibs)
                                         fibs))))

Potential fixes

Short term

The exercise 3.57 as presented is mistaken, so it should be fixed. Here is a sample rewritten version:

How many additions are performed when we compute the _n_th Fibonacci number using the declaration of fibs based on the add_streams function? Explain why this number grows exponentially in relation to n. Using memo, provide an implementation of fibs which only uses a linear number of additions in relation to n. You may need the function stream_map_2_optimized described in exercise 3.50.

Long term (2nd edition?)

I don't know the reasons that the authors decided to do away with cons-stream in the JavaScript edition, but in my comparison this feels like a change that could be reverted. One strong message of the book as a whole is to use constructors and selectors as abstractions for the implementation details, so having con_stream as a constructor which wraps pair(sHead, memo(sTailFunc)) seems reasonable.

I did run into one problem when trying to fully lean into copying the Scheme implementation. Namely, I tried to define cons_stream as follows:

function cons_stream(sHead, sTail) {
    return pair(sHead, memo(() => sTail));
}

I could not longer create streams that referenced themselves:

const fibs = cons_stream(0,
        () => cons_stream(1, () => add_streams(stream_tail(fibs), fibs)));
// ReferenceError: Cannot access 'fibs' before initialization

I don't know if this is a difference between JS and Scheme on how expressions are treated, and I confess that my understanding is not deep enough to definitively show it one way or another. Skimming future sections, I see more material on delay, normal vs applicative order etc. so perhaps there is an explicit reason for changing this implementation.

However from what I can tell, there would be no drawback by providing the memoized implementation of cons_stream, where the two arguments must be a stream head, and a function returning the stream tail.

PossibilityZero commented 1 day ago

Debug code

I'm providing the following debugging code that I used. There are some syntax elements that are from standard JS and not Source, but it should be fairly straightforward to execute in a local NodeJS environment with the sicp npm package.

import { pair, is_null } from 'sicp';
import { head, stream_tail } from 'sicp';
import { stream_ref } from 'sicp';

const ITER = 25;
const OPT_ITER = 25;
const MEMO_ITER = 50;

function memo(fun) {
    let already_run = false;
    let result = undefined;
    return () => {
               if (!already_run) {
                   result = fun();
                   already_run = true;
                   return result;
               } else {
                  //console.log("returning stored");
                   return result;
               }
           };
}

(function unoptimized() {
  function stream_map_2(f, s1, s2) {
    return is_null(s1) || is_null(s2)
        ? null
        : pair(f(head(s1), head(s2)),
               () => stream_map_2(f, stream_tail(s1), stream_tail(s2)));
  }

  let addCalls = 0;
  function add_streams(s1, s2) {
    const add = (a, b) => {
      addCalls++;
      return a + b;
    };
    return stream_map_2(add, s1, s2);
  }

  const fibs = pair(0,
          () => pair(1, () => add_streams(stream_tail(fibs), fibs)));

  console.log("Unoptimized:");
  for (let i = 0; i < ITER; i++) {
    addCalls = 0;
    console.log(`Fib(${i+1}): ${stream_ref(fibs, i)}\tAdd calls: ${addCalls}`);
  }
  console.log();
})();

(function optimized() {
  function stream_map_2_optimized(f, s1, s2) {
    return is_null(s1) || is_null(s2)
        ? null
        : pair(f(head(s1), head(s2)),
               memo(() => stream_map_2_optimized(f, stream_tail(s1), stream_tail(s2))));
  }

  let addCalls = 0;
  function add_streams(s1, s2) {
    const add = (a, b) => {
      addCalls++;
      return a + b;
    };
    return stream_map_2_optimized(add, s1, s2);
  }

  // "optimized" fibs; doesn't correctly memoize all stream constructions
  const fibs = pair(0,
          () => pair(1, () => add_streams(stream_tail(fibs), fibs)));

  /* correctly memoized version - uncomment this section to use
  const fibs = pair(0,
          memo(() => pair(1, memo(() => add_streams(stream_tail(fibs), fibs)))));
  */

  console.log("Optimized:");
  for (let i = 0; i < OPT_ITER; i++) {
    addCalls = 0;
    console.log(`Fib(${i+1}): ${stream_ref(fibs, i)}\tAdd calls: ${addCalls}`);
  }
  console.log();
})();

(function all_streams_memoized() {
  function cons_stream(sHead, sTailFunc) {
    return pair(sHead, memo(sTailFunc));
  }

  function stream_map_2(f, s1, s2) {
    return is_null(s1) || is_null(s2)
        ? null
        : cons_stream(
              f(head(s1), head(s2)),
              () => stream_map_2(f, stream_tail(s1), stream_tail(s2)));
  }

  let addCalls = 0;
  function add_streams(s1, s2) {
    const add = (a, b) => {
      addCalls++;
      return a + b;
    };
    return stream_map_2(add, s1, s2);
  }

  const fibs = cons_stream(0,
          () => cons_stream(1, () => add_streams(stream_tail(fibs), fibs)));

  console.log("With cons_stream:");
  for (let i = 0; i < MEMO_ITER; i++) {
    addCalls = 0;
    console.log(`Fib(${i+1}): ${stream_ref(fibs, i)}\tAdd calls: ${addCalls}`);
  }
  console.log();
})();
martin-henz commented 1 day ago

Looking at your first post, you write:

However, when I actually tried it I found that the number of addition operations is actually identical. In fact, the memoized functions are never called.

Well, the optimization will be significant, when you use the result stream. Here is an example: https://share.sourceacademy.org/8v1bs You observe that repeated access of s2 does not lead to repeated application of the function that is passed to stream_map_2_optimized.

So I don't think this is a bug in SICP JS.

martin-henz commented 1 day ago

You are rightly pointing out that SICP JS 3.5 significantly deviates from SICP 3.5. The reason for this is that JavaScript does not have macros. In Scheme, the function cons_stream can be implemented with a macro so we can hide the creation of the function object. Macros effectively allow you to gloss over the delayed evaluation, whereas in JavaScript, the delayed evaluation needs to be made explicit. This led us to take a different pedagogical approach in this section.

PossibilityZero commented 1 day ago

Looking at your first post, you write:

However, when I actually tried it I found that the number of addition operations is actually identical. In fact, the memoized functions are never called.

Well, the optimization will be significant, when you use the result stream. Here is an example: https://share.sourceacademy.org/8v1bs You observe that repeated access of s2 does not lead to repeated application of the function that is passed to stream_map_2_optimized.

Yes, stream_map_2_optimized works as intended for correctly returning the memozied function.

However, the bug that I am pointing out is not with stream_map_2_optimized but rather in the implementation of fibs. The exercise suggests that changing add_stream to use stream_map_2_optimized would cut down the addition operations, but this turns out not to be the case. The cause is that we are constructing two more streams in the definition of fibs, which don't use memoization. The exponential factor is introduced not just from the new stream constructed within add_streams, but also from the definition of fibs.

I've added on to your example to demonstrate: https://share.sourceacademy.org/c7onj


You are rightly pointing out that SICP JS 3.5 significantly deviates from SICP 3.5. The reason for this is that JavaScript does not have macros. In Scheme, the function cons_stream can be implemented with a macro so we can hide the creation of the function object. Macros effectively allow you to gloss over the delayed evaluation, whereas in JavaScript, the delayed evaluation needs to be made explicit. This led us to take a different pedagogical approach in this section.

Makes sense, thank you for the explanation. Hopefully I'll understand this deeper once I've finished the book.

martin-henz commented 1 day ago

I see. Thanks for your patience explaining. Indeed, the problem is that the two tails in fibs are not memoized. Your solution is correct. So the bug lies in the formulation of Exercise 3.57. It should read as follows (with emphasis added):

Exercise 3.57

How many additions are performed when we compute the nth Fibonacci number using the declaration of fibs based on the add_streams function? Show that this number is exponentially greater than the number of additions performed if add_streams had used the function stream_map_2_optimized described in exercise 3.50, and if we had declared fibs like this:

const fibs = pair(0, memo(() => pair(1, memo(() => add_streams(stream_tail(fibs), fibs)))));

martin-henz commented 1 day ago

Indeed: A stream_pair function would allow us to hide the memoization. This would lead to an interesting alternative pedagogical approach in this section, and would take us closer to the original SICP. To be considered for a new edition. Thanks a lot for pointing this out!