purescript / purescript-prelude

The PureScript Prelude
BSD 3-Clause "New" or "Revised" License
163 stars 88 forks source link

fixes #309: call stack exceeded in Control.Bind.Bind instance for Array #311

Open michaelficarra opened 9 months ago

michaelficarra commented 9 months ago

Description of the change

Fixes #309 by batching the arguments passed to Array.prototype.push by Function.prototype.apply. I chose 10,000 for the batch size because it's definitely in the safe range for major implementations while still being large enough to not have much effect on performance.

I couldn't find where the tests for Control.Bind live so I didn't add a test.


Checklist:

JordanMartinez commented 9 months ago

There's likely not any tests for Control.Bind.

However, what is the underlying issue here and how does this fix the underlying issue?

michaelficarra commented 9 months ago

The underlying issue is that JS implementations have a limit to the number of arguments that can be passed to a function (even through Function.prototype.apply). The limit is implementation-specific, but generally ≥ 65,535. The repro in #309 was causing the second argument to Function.prototype.apply (the argument list for Array.prototype.push) to exceed this limit. The solution is to just call Array.prototype.push multiple times, batching the work.

JordanMartinez commented 9 months ago

The limit is implementation-specific, but generally ≥ 65,535.

Where is that documented?

JordanMartinez commented 9 months ago

Found this on MDN docs:

The consequences of calling a function with too many arguments (that is, more than tens of thousands of arguments) is unspecified and varies across engines. (The JavaScriptCore engine has a hard-coded argument limit of 65536.) Most engines throw an exception; but there's no normative specification preventing other behaviors, such as arbitrarily limiting the number of arguments actually passed to the applied function. To illustrate this latter case: if such an engine had a limit of four arguments (actual limits are of course significantly higher), it would be as if the arguments 5, 6, 2, 3 had been passed to apply in the examples above, rather than the full array.

michaelficarra commented 9 months ago

Like all OOM errors, it is a violation of the JS spec. The JS spec requires infinite memory for compliance, which obviously no implementation can provide. The best I can do is this article on MDN which recommends a strategy like the one I've used here.

JordanMartinez commented 9 months ago

Since this is already a bug, I think fixing the underlying issue with a limit of 65535 is desirable because that's a known hard limit. If someone has proof that this number is still too large for a given JS engine, we could decrease the number for the other JS engine. But I think a limit of 10k is too small given that it loops potentially 6 times unnecessarily.

michaelficarra commented 9 months ago

Personally, I would prefer to stay well below the known upper bound. Implementations may reduce this limit in the future, and lesser-known implementations (such as those for embedded systems) may already have a lower limit today. I don't think the overhead of 6 loop iterations matters much for such heavy workloads. We could prove it out with benchmarking if we really wanted to know.

edit: The example from mdn uses 32768, half of JSC's limit. This seems like an okay compromise.

JordanMartinez commented 9 months ago

I still stand by what I said. When we find a new limit, we can decrease it. But until I have hard evidence of that, there's nothing to say that my arbitrary choice is better/worse than your arbitrary choice.

acple commented 9 months ago

Current change is sliceing array twice per loop but it is not necessary I think. How about this:

var v = f(arr[i]);
for (var j = 0; j < v.length;)
  Array.prototype.push.apply(result, v.slice(j, j += CHUNK));

But this always slice at least once if even not necessary, should we consider it?

acple commented 9 months ago

I agree that 65535 as a hard limit. Because the JSC's hard limit is also an arbitrary value as a limitation.

Stack size is finite, and 0xFFFF seems as good an arbitrary limit as any other would be. :-)

So it should work fine for now.

JordanMartinez commented 9 months ago

Current change is sliceing array twice per loop but it is not necessary I think. How about this:

var v = f(arr[i]);
for (var j = 0; j < v.length;)
  Array.prototype.push.apply(result, v.slice(j, j += CHUNK));

But this always slice at least once if even not necessary, should we consider it?

This does seem like a better implementation. Thanks for the suggestion.

JordanMartinez commented 9 months ago

But this always slice at least once if even not necessary, should we consider it?

This implementation only slices if needed at the cost of an if block. I'm not sure which is faster and we'd need to benchmark this to know.

var APPLY_CHUNK_SIZE = 10e3;
var push = Function.prototype.apply.bind(Array.prototype.push);

export const arrayBind = function (arr) {
  return function (f) {
    var result = [];
    for (var i = 0, l = arr.length; i < l; i++) {
      var subArr = f(arr[i]);
      if (subArr.length <= APPLY_CHUNK_SIZE) {
        push(result, subArr);
      } else {
        for (var j = 0; j < v.length;) {
          Array.prototype.push.apply(result, v.slice(j, j += APPLY_CHUNK_SIZE));
        }
      }
    }
    return result;
  };
};

Testing would need to check for off-by-one errors here.