Strategies for COW friendly memory page sharing

andrewchambers commented 4 years ago

Janet web servers, especially with large amounts of bytecode and embedded resources would benefit greatly (in the form of reduced memory overhead) if there were ways to share this memory across VM process instances.

A typical way to do this, is to start your application, listen on a socket, then fork N times with each fork is able to accept and handle connections. This way the loaded embedded resources and bytecode are all stored in shared COW pages in memory.

Unfortunately the current janet VM trashes these COW pages by writing the mark bit during GC. Ruby (mostly?) solved this in 2.0 by moving the mark bits into a compressed bitmap in this patch:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/41916

This is one way to solve it, though perhaps not the only way.

Another way may be a special gc bit marking all live immutable and non reference containing gc objects as permanently marked - they will never be collected and never need to be walked again (or have their mark bit written to).

(defn main [...]
  (init-application)
  (gccollect)
  (mark-objects-with-magic-bit)
  (fork-and-serve))

This solution also makes GC more efficient for an upfront cost, as we can skip all these objects in future collections. I'm unsure if there is an efficient way to compute the relevant object subgraphs. I'm not totally sure how well this trick would work and there may be some complicated details, but maybe worth more experiments.

andrewchambers commented 4 years ago

Just a note, on linux its possible to use a program called 'smem' to view page sharing in experiments. I also did some experiments using https://github.com/andrewchambers/janet-fork , which may be useful for testing.

bakpakin commented 4 years ago

This is definitely something I will think about, but exactly what solution we use is definitely up for debate. The magic bit solution definitely seems the simplest to add.

andrewchambers commented 3 years ago

another thing to consider for this is if it allows the os to page out infrequently used bytecode.

cody271 commented 2 years ago

Janet web servers, especially with large amounts of bytecode and embedded resources would benefit greatly (in the form of reduced memory overhead) if there were ways to share this memory across VM process instances.

This is a builtin feature of LuaJIT, and a huge factor in its really high performance in some larger codebases.

Basic example with a Lua module named foo:

$ cat foo.lua

return { bar = function() return 42 end }

$ luajit -e 'print(require("foo").bar())' 
42

Use LuaJIT to compile foo.lua into a native module foo_cow.so:

$ luajit -b -t c -n foo_cow foo.lua foo_cow.c
$ gcc -fPIC -shared foo_cow.c -o foo_cow.so
$ nm foo_cow.so | grep lua
luaJIT_BC_foo_cow

Which can now be shared across VM process instances:

$ luajit -e 'print(require("foo_cow").bar())'
42

This is simpler (and safer) than having to modify the GC or memory paging behavior. Unless I'm misunderstanding, this same technique should work for Janet as well.

andrewchambers commented 2 years ago

This is simpler (and safer) than having to modify the GC or memory paging behavior.

I am not sure it is simpler as it would require writing a janet bytecode to native code compiler, it is still an alternative and cool approach.

cody271 commented 2 years ago

I am not sure it is simpler as it would require writing a janet bytecode to native code compiler, it is still an alternative and cool approach.

No no much simpler, no native AOT. Just embeds the bytecode as an exported symbol in the native lib:

$ cat foo_cow.c

#ifdef __cplusplus
extern "C"
#endif
#ifdef _WIN32
__declspec(dllexport)
#endif
const unsigned char luaJIT_BC_foo_cow[] = {
27,76,74,2,10,15,0,0,1,0,0,0,2,41,0,42,0,76,0,2,0,35,3,0,2,0,3,0,5,53,0,1,0,
51,1,0,0,61,1,2,0,50,0,0,128,76,0,2,0,8,98,97,114,1,0,0,0,0
};

andrewchambers commented 2 years ago

This still needs a modifications in the gc since the embedded bytecode would need to be marked as special so it isn't freed - but I do think this is probably the best approach. This would be a good feature to add to jpm generated binaries.

cody271 commented 2 years ago

This still needs modifications in the gc since the embedded bytecode would need to be marked as special so it isn't freed.

Yes this is an internal implementation detail though. Very different than exposing a new user facing API that can manipulate the GC. I would caution against exposing something like (mark-objects-with-magic-bit).

andrewchambers commented 2 years ago

Oh yeah definitely - that magic bit thing is a bad idea in retrospect, the way ruby handles it is more reasonable.

janet-lang / janet

Strategies for COW friendly memory page sharing #475