laverdet / isolated-vm

Secure & isolated JS environments for nodejs
ISC License
2.14k stars 152 forks source link

Including libraries in snapshot #121

Closed jasonayre closed 5 years ago

jasonayre commented 5 years ago

This library seems to be almost exactly what I'm looking for, except I can't figure out how to pass a library to a snapshot. The documentation seems to imply that it's possible:

(Isolate snapshots are a very useful feature if you intend to create several isolates running common libraries between them. )

So maybe I am missing a step? For sake of example, say I want to pass the ivm library into the snapshot, I've tried variations such as the following, to no avail. Am I doing something wrong or is this not supported, or bug? (Couldn't find any examples of doing it in the source/specs).


const snapshot = ivm.Isolate.createSnapshot([
  {code: ivm}
]);

const snapshot = ivm.Isolate.createSnapshot([
  {code: ivm.toString() }
]);

const snapshot = ivm.Isolate.createSnapshot([
  {code: `${ivm}` }
]);

const snapshot = ivm.Isolate.createSnapshot([
  {code: ivm + "" }
]);
```;
laverdet commented 5 years ago

The only thing that can go in in the code property is code.. like a string.

If you need the ivm module I recommend doing something like:

function bootstrap(ivm) {
  // use ivm here
}

and then pass the ivm module to this function.

jasonayre commented 5 years ago

Strange, I was actually able to pass most classes into the snapshot via ClassName + "\n", just not objects or modules. -- But would you mind expanding on that example, I'm still unsure how I can pass it into the snapshot context, to have the library available on the snapshot before I initialize anything (which, unless I am misunderstanding, once again the docs seem to imply it's possible) -- If you're referring to the bootstrap strategy in the docs, I played around with that but:

Stepping back to use case: I have an AI, and I have a main event store like model which is responsible for the state of everything, and I sometimes will simulate a particular action, to score its outcome. I.e.

game_state.simulate = (block) => { return new GameState(this.events).applyAction(block); }

So before each simulated decision, which suffers from o(n) right now due to having to load all the games events for every simulation, and every simulation getting slower the more events there are/possible targets to consider, I'm trying to come up with a faster way to load the game state in an isolated snapshot, in order to simulate an action, return/store a POJO w some data on the results, and discard it immediately after. So I'm trying to have the library already required in the snapshot as a first step, see if I can snapshot the instance with the events loaded as a second step.

Anyways I mention all that for clarity and or if you know of a better way I could accomplish. But step 1 would be just getting the library itself included into the snapshot, which I can't seem to get working. (I mentioned including ivm above just as an example, I don't actually need it). In the bootstrap example it was doing jail.setSync('_ivm', ivm); - I tried passing my library in the same way but it failed. Will injecting into the bootstrap function (which still unclear how to set it up differently if so), be able to work around that?

Thanks for the lib btw!

laverdet commented 5 years ago

Strange, I was actually able to pass most classes into the snapshot via ClassName + "\n", just not objects or modules.

This will work on simple cases but I wouldn't advise it in a real application.

So I'm trying to have the library already required in the snapshot as a first step, see if I can snapshot the instance with the events loaded as a second step.

I don't think this is a good idea. Snapshots are meant for code not data. Also why can't you just save your game state as a JSON object and import/export it to the isolate with ExternalCopy?

I tried passing my library in the same way but it failed.

You can only use set on things that are transferable.

jasonayre commented 5 years ago

Strange, I was actually able to pass most classes into the snapshot via ClassName + "\n", just not objects or modules.

This will work on simple cases but I wouldn't advise it in a real application. Ya I'm not planning on it, doesn't do me any good without being able to include a whole library though.

So I'm trying to have the library already required in the snapshot as a first step, see if I can snapshot the instance with the events loaded as a second step.

I don't think this is a good idea. Snapshots are meant for code not data. Also why can't you just save your game state as a JSON object and import/export it to the isolate with ExternalCopy?

I was unsure as to whether it was or not also, but I'm fairly certain it's the right way to do things now, I'll explain further below in new comment for clarity. -- It's a very complex game with many objects, heavily event driven, but most importantly, because there are cards which have abilities such as "if you did X this turn, gain Y reward", which would be a nightmare to program if I was trying to write to a shared state. So TL;DR; because it's largely "stateless"

I tried passing my library in the same way but it failed.

You can only use set on things that are transferable.

So is it not possible to actually embed a library into the snapshot right now then? It would be a really useful feature to add imo, and docs make it sound possible.

The reason I was optimistic about this lib was because it seemed to mirror the apis of a library I've been using which fits my needs perfectly, except that it's a ruby library and I'm trying to run the library in JS land. - I've been using it for a separate part of my application, doing similar things, but point is I know all the stuff is possible because the ruby library is able to do it, and I am doing so currently. It might be worth taking a look at as it's quite powerful/simple. https://github.com/discourse/mini_racer

I hadn't actually tried loading the AI library in my ruby app because I'm not planning on doing so, so I went ahead and did so to benchmark. Even benchmarking it in a rails app, unoptimized, with the overhead of passing data between ruby and node vms, it's still 3x faster, due to the use of a preloaded snapshot. I'd imagine if I was doing this in node with a similar api, it would be much faster (rails is fairly slow/memory hog also).

Some further details: I'm just benchmarking the reloading the game state (which in the case of ruby, I'm just creating a new isolate vm and passing the context/snapshot to it to load the state, also setting a var on the game state), test game state has 650 or so events. Ran a few times each and got similar results each time.

***Ruby managed node VMs (via miniracer):

***Node

laverdet commented 5 years ago

So is it not possible to actually embed a library into the snapshot right now then? It would be a really useful feature to add imo, and docs make it sound possible.

It is definitely possible. Screeps creates a snapshot with at least a megabyte of JS code. The problem is that you are trying to create a snapshot which contains C++ code (all isolated-vm functions are C++).

I looked at mini_racer and indeed it is very similar to isolated-vm. The snapshot functionality is almost identical to isolated-vm though, so I'm not sure where you're running into problems. Specifically mini_racer has the same limitations as isolated-vm, which are indeed limitations of v8:

There is an important limitation to snapshots: they can only capture V8’s
heap. Any interaction from V8 with the outside is off-limits when creating the
snapshot. Such interactions include:

 * defining and calling API callbacks (i.e. functions created via v8::FunctionTemplate)
 * creating typed arrays, since the backing store may be allocated outside of V8

And of course, values derived from sources such as `Math.random` or `Date.now`
are fixed once the snapshot has been captured. They are no longer really random
nor reflect the current time.

I think the main issue here is that mini_racer does a lot of things for you "automagically" which I've tried to avoid in this library. For example you're unable to take an instance of Object out of an isolate without using ivm.ExternalCopy or ivm.Reference. There are decisions regarding both correctness and performance that I don't want to make for the user of this API so instead the user is forced to make that decision themselves.

jasonayre commented 5 years ago

Ya so, I'm not running into any v8 limitations (only trying to do exactly what I'm doing in mini racer), and I was only trying to pass isolated-vm in for sake of example, I was trying to pass my own library in outside of example provided.

So now that I know it should work, I went back and I think I figured out the library loading issue, looks like I need to do:

let snapshot = ivm.Isolate.createSnapshot([{
  code: `${fs.readFileSync(filepath, 'utf8')};
}]);

I was either not passing in utf8 argument, or may have been missing a semi colon.

I'm very close to being able to mirroring the ruby loading. Everything is working, except the most important part :) -- loading the events into the game state when creating the snapshot. Everytime I try to load it I get a

Segmentation fault: 11 -- Which is apparently a can no longer access memory issue.

Any idea what could be causing that? I am doing the following, which is now the exact same strategy I am using with mini-racer.

let snapshot = ivm.Isolate.createSnapshot([{
  code: `${fs.readFileSync(my_lib_filepath, 'utf8')};
  this['current_events'] = ${JSON.stringify(events_data)};
  MyLib['Models']['GameState'].FromJSON(JSON.stringify(this['current_events']));
`}]);

sidenote: I am not sure why I need to double stringify, also cannot pass in the current events directly for some reason, json parser always blows up if I try, but the events look perfectly fine when I context.global.getSync('current_events').copySync(), so this might be related but probably not


let isolate = new ivm.Isolate({ snapshot });
let script = isolate.compileScriptSync(`this['current_game_state'] = MyLib['Models']['GameState'].FromJSON(JSON.stringify(this['current_events']));`);
let result  =script.runSync(isolate.createContextSync({release: false}));
```;

script.runSync is what triggers the SegFault.

Any ideas what could cause that or why it would work on mini racer but not here? (loading same library on disk, using same events file for both)
laverdet commented 5 years ago

I was either not passing in utf8 argument, or may have been missing a semi colon.

The semicolon is optional but the 'utf8' argument is important. Otherwise this function will returns Buffer instead of a string. If you're new to nodejs from ruby you'll probably bump into a lot of stuff like this.

Segmentation fault: 11 -- Which is apparently a can no longer access memory issue.

Segmentation faults shouldn't be happening under any conditions, so that is probably my fault. Would you be able to post the contents of the library and events_data? You can email me if you prefer marcel -a- laverdet.com.

jasonayre commented 5 years ago

I was either not passing in utf8 argument, or may have been missing a semi colon.

The semicolon is optional but the 'utf8' argument is important. Otherwise this function will returns Buffer instead of a string. If you're new to nodejs from ruby you'll probably bump into a lot of stuff like this.

Yea, though I've had a couple of issues where adding a semi colon where I assumed was optional, fixed script from breaking (while trying to get this working) -- And I've had to do it before when reading file but have limited ram in my brain to allocate to wonky node apis :)

Segmentation fault: 11 -- Which is apparently a can no longer access memory issue.

Segmentation faults shouldn't be happening under any conditions, so that is probably my fault. Would you be able to post the contents of the library and events_data? You can email me if you prefer marcel -a- laverdet.com.

Anyways a quick update, I got the initial part working shortly after opening but was still having issues. Strange part is, I have no idea what I did to fix the initial game state load, I thought I'd figure it out while working through so I didn't undo, but should have, because I swear I didn't change anything. But it must have been another minor nuance, or maybe webpack was compiling slow so my changes were getting super delayed, not really sure.

In terms of what's still broken, I can't share full repo but I will get a reproduce working hopefully tonight when I have time to look at it again. I do have a guess as to the issue though:

The parts which are still breaking all seem to be recursive functions, I narrowed it down that far. So, I'm wondering if maybe it has something to do with the non standard return behavior and how that's being captured and sent back to the outer context (i.e. how I don't have to call return at the end of the last function called for it to return the call at the end).

Which is just a suspicion. But the part I know for certain is a couple of methods in the game loop which recursively add/remove/process events, are breaking, as well as a couple of lodash helper methods I have. But I will try and get a repro tonight at some point or worst case tomorrow