patriksimek / vm2

Advanced vm/sandbox for Node.js
MIT License
3.86k stars 293 forks source link

Lib memory leak #524

Closed GabrielLomba closed 1 year ago

GabrielLomba commented 1 year ago

The lib seems to be leaking sandboxes. Those are never collected because it appears they're attached to a closure in Object.prototype methods. I can reproduce using the sample code below and the latest vm2 version (3.9.17 at the time of writing):

const {VMScript, VM} = require('vm2');

// class to make it easier to find obj in heap dump
class Leak_sample { constructor(v){ this.v = v; } };

const main = async ()=>{
    const obj = new Leak_sample({data: 'should_be_collected'}), data = [];
    const sandbox = {
        fn: ()=>console.log('Called sandbox fn: ', obj.v),
        collect: d=>data.push(new Leak_sample(d)),
    };
    const s = new VMScript(`
        fn();
        collect({data: 1});
        collect({data: 2});
    `, 'runner.js');
    console.log('Running script');
    await new VM({eval: false, wasm: false, sandbox}).run(s);
    console.log('Ran script: ', data);
}

main().then(()=>{
    global.gc();
    console.log('Ran GC');
});

process.stdin.resume(); // prevents Node from exiting

I would expect to have all Leak_sample objects collected after finishing main() but, even after running global.gc() + using the "Collect garbage" feature in Chrome to be sure, the Leak_sample objects are still kept in memory:

2023-04-28_11h53_00

Shouldn't those objects have been collected? If not, what approach am I missing to force their collection and not result in an ever increasing memory usage?

XmiliaH commented 1 year ago

This library does not add any properties to Object.prototype. Furthermore, I do not know what the gc function does. The only thing I can say is that the posted code should not leak memory.

GabrielLomba commented 1 year ago

@XmiliaH I did not say it adds properties to Object.prototype. The code still touches it though, in bridge.js.

global.gc() forces the Garbage collector to run. Node exposes it if you run the script with the --expose-gc flag.

I can reproduce this behavior deterministically. If you run node --expose-gc --inspect vm2_leak.js and take heapdumps with Chrome, you should also see that all Leak_sample objects are kept in memory even though we're already out of main() scope. Did you try to reproduce it yourself?

XmiliaH commented 1 year ago

No, I did not try to reproduce this as I do not know what gc does and what it guarantees as it is possible that it will not collect the context object from vm modules which is a special object.

GabrielLomba commented 1 year ago

@XmiliaH if I understand correctly, the object is currently being kept in memory due to a closure in Object.prototype.__defineGetter__, which the lib also touches in setup-sandbox.js, not really related to the vm context object.

image

XmiliaH commented 1 year ago

Yes, we put the __defineGetter__ function into a WeakMap as seen in the image. But that should remove the entry when the key becomes dead.

GabrielLomba commented 1 year ago

I see in this comment that vm contexts used to not be collected due to GC heuristics (only really collected once we hit --max-old-space-size) but it's no longer the case in newer Node versions. I'm experiencing this in Node v14.19.0, however. Am I right by saying this is what's happening here? GC just does not want to dispose of the vm context because it has available memory elsewhere and this keeps the context Object.prototype copy in memory alongside everything else?

XmiliaH commented 1 year ago

I do not fully know how the vm context object integrates with the garbage collector, but I would assume that this is the case.

GabrielLomba commented 1 year ago

I have modified the script to allocate a significant amount of memory with Leak_sample and then read a 30MB randomly generated file and GC does end up collecting the Leak_sample objects when the process needs more memory (when we reach --max-old-space-size). There is no leak here. Appreciate the help @XmiliaH !