tc39 / proposal-oom-fails-fast

Proposal: Out of memory immediately terminates agent cluster
Apache License 2.0
13 stars 4 forks source link

Alternate opt-in designs #1

Open syg opened 5 years ago

syg commented 5 years ago

This feature must be opt-in, as this change is web incompatible. Many web-facing code do in fact catch and process OOM and over-recursion errors. Also, singling out OOMs in general sounds scary for implementers and has open problems about making sure the host plays nice wrt handling OOMs.

Two alternate designs came up in TC39:

nothrow

From Waldemar. Syntax to annotate a function such that any exceptions arising from it self are uncatchable and terminate the agent.

Uncatchable exception

From @kmiller68. Spec a new UncatchableException so that any code that needs the no-throw invariant are free to try { ... } catch () { throw new UncatchableException } or something like that.

After thinking about it, uncatchable exceptions seem better all around: no new syntax, syntactic intent of termination, clear scope of code that requires the no-throw invariant.

ljharb commented 5 years ago

With uncatchable exceptions though, any code I call might throw one, and that also seems problematic - if i call a function within a try/catch, i rely on the catch triggering.

kmiller68 commented 5 years ago

Alternatively, we could have a TerminateExecutionException() that permanently kills an Agent and possibly all associated Agent clusters when thrown. This is a pretty powerful API so we might need a way to opt out or in to it.

syg commented 5 years ago

With uncatchable exceptions though, any code I call might throw one, and that also seems problematic - if i call a function within a try/catch, i rely on the catch triggering.

Well, nothrow has the same issue, no? This reliance on try/catch seems like a design issue intrinsic to any solution to the problem Mark proposed. Whether there's an uncatchable termination that is only triggered by OOM, or a userland uncatchable exception, or some other summary termination API, it will all result in the user being okay with calling code that may result in summary termination, and thus the try/catch not triggering.

If that is an unacceptable premise, then that should be more clearly communicated to the champion.

ljharb commented 5 years ago

I tried to convey that today in plenary; I find the concept of any uncatchable exception highly problematic.

syg commented 5 years ago

Uncatchable exception here really means termination. Do you find uncatchable exceptions unacceptable but not termination? Or do you find both unacceptable? I can understand the latter but not the former.

On Thu, Oct 3, 2019, 23:31 Jordan Harband notifications@github.com wrote:

I tried to convey that today in plenary; I find the concept of any uncatchable exception highly problematic.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Agoric/proposal-oom-fails-fast/issues/1?email_source=notifications&email_token=AAAXLUMBOOB7F2WTYCPKY73QM22HTA5CNFSM4I5HHBFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAKH6WA#issuecomment-538214232, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAXLUNB74MFVNNHAV5KPZTQM22HTANCNFSM4I5HHBFA .

ljharb commented 5 years ago

When you say "termination", how does that interact with window.onerror, or node's onUncaughtException/onUnhandledRejection events?

erights commented 5 years ago

Attn @waldemarhorwat

syg commented 5 years ago

When you say "termination", how does that interact with window.onerror, or node's onUncaughtException/onUnhandledRejection events?

I'm under the impression window.onerror would be called. I'm not familiar enough with Node to give an opinion on the latter. @erights?

rhuanjl commented 4 years ago

I'm an outsider so sorry if I'm missing something but my read of this was that the point of this proposal is to eliminate a whole category of potential vulnerabilities where out of memory errors (for which the recovery can be highly inconsistent) can lead to undefined behaviour.

If that is the point having the fix be opt in would seem to defeat the point - you can't eliminate a vulnerability on an opt-in basis. UNLESS the idea is to only eliminate these potential vulnerabilities in specific "safe" environments where the opt in is set at the start.

xtuc commented 4 years ago

For Cloudflare workers we, ideally, wanted an OOM to be catchable to allow the user to report the failure and add some context. window.onerror would be called

caridy commented 4 years ago

Good timing! I suspect @erights and co. are raising this issue due to the fact that OOM exception is one of the most common ways to attack any sandboxing mechanism in JS. We have been researching on this recently, and it is my understand that OOM is still something that can be controlled by the membrane implementation to prevent leaking objects from another realm (yes, it is hard, but possible, we will share more this week). If this is the primary driver, then I believe we can get away without any change, considering that the alternative is to fail-fast, which is not web compatible as @syg pointed out.

erights commented 4 years ago

Hi @caridy , my concern is not only the integrity of membrane separation, but any stateful invariant whatsoever. See my doubly-linked-list example. If it is almost impossible to defend stateful invariants in general, secure programming becomes impractically hard.

This proposal is only web incompatible in the absence of an opt-in. Hence this issue thread and my last presentation on this proposal

https://www.youtube.com/watch?v=gzL_ymFtKuQ&list=PLzDw4TTug5O0ywHrOz4VevVTYr6Kj_KtW

which I think you saw.