laverdet / isolated-vm

Secure & isolated JS environments for nodejs
ISC License
2.18k stars 153 forks source link

Timeout doesn't work properly when isolate code uses `apply` (or basically, whenever it has to wait for an async result) #185

Open nsamsonau-pal opened 4 years ago

nsamsonau-pal commented 4 years ago

Shortest repro

const ivm = require('isolated-vm');
const isolate = new ivm.Isolate();
const context = isolate.createContextSync();
const global = context.global;
global.setSync(`_ivm`, ivm);

(async function() {
    global.setSync('proxiedFoo', () => {}, { reference: true });
    const result = await isolate
                    .compileScriptSync("proxiedFoo.apply(undefined, []).then(() => { while (true) {} });")
                    .run(context, { timeout: 2000, promise: true });
    console.log(result); // Never reaches here nor does it throw
})();

An interesting point is that using runSync instead of run returns a Promise, which never resolves.

Context

We are still currently on 1.7.11, and we have a hacky complex setup to allow calling an async function in an isolate and receive an async result (e.g. await MyIsolate.executeAsync("async () => { fetch(...) } ")). Before transfer options, we had to wrap our execution in a new Promise((resolve, reject) => /* proxy resolve, reject; call isolate with timeout */ }) and proxy resolve and reject, so the isolate can call resolve when the result is ready.

The problem with this setup is that the call to the isolate finishes before the result, because it doesn't actually return anything - it only schedules a fetch, which then calls the proxied resolve once the result is ready. Which means, the timeout does not apply to the entire execution, and then, if there is an issue in the isolate code, the isolate will hang forever. To circumvent that, we wrap it in setTimeout with the timeout + errorMargin and then dispose the isolate (which is way less graceful).

We were hoping that with the transfer options, this could be simplified to a simple call with transfer options { promise: true }, but the timeouts don't appear to solve our problem.

Really appreciate any help.

laverdet commented 4 years ago

Yeah this is a known footgun. This also results in errors that don't have anywhere to go:

const ivm = require('isolated-vm');
const isolate = new ivm.Isolate();
const context = isolate.createContextSync();
context.evalClosureSync(
  '$0.apply().then(() => { throw new Error("arst") })',
  [ () => {} ],
  { arguments: { reference: true } });
setTimeout(() => console.log(context.global.getSync('foo')), 1);

Here the call to getSync throws the error that was thrown from the evalClosure. The problem, as you figured out, is that the act of resolving a promise can invoke another isolate. There's no way to pass options to or return errors from the resolution, because then it would be possible to throw an error back again. Settling promises between two isolates turns into the well-known Two Generals' Problem.

I might be able to fix your issue by forwarding the timeout data to the promise resolver.. but if it does timeout then there's nowhere to even send the exception. I'd need to add some kind of callback system in order to listen for orphaned errors.

nsamsonov commented 4 years ago

(Writing from my personal account)

That's interesting. I see where this is coming from, but I perceived having options set as { timeout: 2000, promise: true } as "abort if there is no resolved result after 2000ms" (so I guess I hoped you did a similar setup to ours but on a lower level).

I also just remembered that OOM errors only propagate to the caller of the last execution leading to the OOM but not to any other pending callers. This is even more problematic because with timeouts at least our "safety net" setTimeout indicates there was one. With OOMs, the isolate just hangs until our safety net timeout kills it, but then we never know if it was an actual timeout or an OOM. Since in our case the call that OOMd the isolate resulted from a proxied function called from the same isolate (iso -> default -> iso -> oom), the error just gets lost, just as you just iterated.

Btw I just tested my example, and adding a mem limit and changing the body to proxiedFoo.apply(undefined, []).then(() => { s = ''; l =[]; while (true) { s = s + 'aaaaaa'; l.push(s); } }); results in an assertion failure rather than a proper OOM.

I can't tell what's best when handling errors: if all active promises should get OOM, or if there should be a centralized way to get errors, or if we should structure our code around this (but then the architecture gets very complex very quickly). For us, throwing OOM on all active invocations into the OOMing isolate and proxying timeouts to the promise resolver would solve many of our problems, but atm not sure if there are any implications with this.

laverdet commented 4 years ago

Probably the easiest solution I can suggest for now is to forward your handlers back to the nodejs isolate and invoke them explicitly using apply so that you can pass the timeout parameter again. So you would rewrite your example as this:

const ivm = require('isolated-vm');
const isolate = new ivm.Isolate();
const context = isolate.createContextSync();

(async function() {
  const { result } = await context.evalClosure(
    "return $0.apply(undefined, []).then(() => () => { while (true) {} })",
    [ () => {} ],
    { timeout: 2000, arguments: { reference: true }, result: { promise: true, reference: true } });
  const then = await result;
  await then.apply(null, [], { timeout: 2000 });
})().catch(console.error);

For more complicated cases I'd recommend maintaining an array of pending tasks from inside the isolate. Then you'd have some function you can call which just runs through the array and executes all the tasks. Basically you'll have to make sure that you aren't accidentally starting any "untrusted" work when you resolve a promise.

JSH32 commented 2 years ago

Found a new way to solve this but its a bit weird. You need to store the CPU time before you run anything and store a timeout. You basically do

const difference = Number(this.isolate.cpuTime/BigInt(1000000)) - this.cpuTimeStarted

To figure out how much time has passed in total then difference > this.timeout you have gone over timeout. If not just do this.timeout - difference and pass that to the apply. You can also do something like this if you want to dispose when everything is done for example.

    /**
     * Mark the start of a task
     * This should be called this any time an asyncronous operation is started within NodeJS land to prevent the isolate from disposing
     */
    private startTask() {
        this.runningTasks++
    }

    /**
     * This must be called after {@link Runtime.startTask} to signal that a pending asyncronous task in NodeJS has been completed
     * This allows the task counter to decrement and disposes of the isolate when the task counter reaches zero and if {@link Runtime.markDispose} was called
     */
    private endTask(success: boolean) {
        if (--this.runningTasks < 1) {
            // No more tasks equals runtime is doing nothing
            this.onCompleted(success)

            // If dispose was called and no more tasks are running dispose the isolate
            if (this.disposeCalled)
                this.isolate.dispose()
        }
    }