Open andrewchambers opened 4 years ago
Just a note, on linux its possible to use a program called 'smem' to view page sharing in experiments. I also did some experiments using https://github.com/andrewchambers/janet-fork , which may be useful for testing.
This is definitely something I will think about, but exactly what solution we use is definitely up for debate. The magic bit solution definitely seems the simplest to add.
another thing to consider for this is if it allows the os to page out infrequently used bytecode.
Janet web servers, especially with large amounts of bytecode and embedded resources would benefit greatly (in the form of reduced memory overhead) if there were ways to share this memory across VM process instances.
This is a builtin feature of LuaJIT, and a huge factor in its really high performance in some larger codebases.
Basic example with a Lua module named foo
:
$ cat foo.lua
return { bar = function() return 42 end }
$ luajit -e 'print(require("foo").bar())'
42
Use LuaJIT to compile foo.lua
into a native module foo_cow.so
:
$ luajit -b -t c -n foo_cow foo.lua foo_cow.c
$ gcc -fPIC -shared foo_cow.c -o foo_cow.so
$ nm foo_cow.so | grep lua
luaJIT_BC_foo_cow
Which can now be shared across VM process instances:
$ luajit -e 'print(require("foo_cow").bar())'
42
This is simpler (and safer) than having to modify the GC or memory paging behavior. Unless I'm misunderstanding, this same technique should work for Janet as well.
This is simpler (and safer) than having to modify the GC or memory paging behavior.
I am not sure it is simpler as it would require writing a janet bytecode to native code compiler, it is still an alternative and cool approach.
I am not sure it is simpler as it would require writing a janet bytecode to native code compiler, it is still an alternative and cool approach.
No no much simpler, no native AOT. Just embeds the bytecode as an exported symbol in the native lib:
$ cat foo_cow.c
#ifdef __cplusplus
extern "C"
#endif
#ifdef _WIN32
__declspec(dllexport)
#endif
const unsigned char luaJIT_BC_foo_cow[] = {
27,76,74,2,10,15,0,0,1,0,0,0,2,41,0,42,0,76,0,2,0,35,3,0,2,0,3,0,5,53,0,1,0,
51,1,0,0,61,1,2,0,50,0,0,128,76,0,2,0,8,98,97,114,1,0,0,0,0
};
This still needs a modifications in the gc since the embedded bytecode would need to be marked as special so it isn't freed - but I do think this is probably the best approach. This would be a good feature to add to jpm generated binaries.
This still needs modifications in the gc since the embedded bytecode would need to be marked as special so it isn't freed.
Yes this is an internal implementation detail though. Very different than exposing a new user facing API that can manipulate the GC. I would caution against exposing something like
(mark-objects-with-magic-bit)
.
Oh yeah definitely - that magic bit thing is a bad idea in retrospect, the way ruby handles it is more reasonable.
Janet web servers, especially with large amounts of bytecode and embedded resources would benefit greatly (in the form of reduced memory overhead) if there were ways to share this memory across VM process instances.
A typical way to do this, is to start your application, listen on a socket, then fork N times with each fork is able to accept and handle connections. This way the loaded embedded resources and bytecode are all stored in shared COW pages in memory.
Unfortunately the current janet VM trashes these COW pages by writing the mark bit during GC. Ruby (mostly?) solved this in 2.0 by moving the mark bits into a compressed bitmap in this patch:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/41916
This is one way to solve it, though perhaps not the only way.
Another way may be a special gc bit marking all live immutable and non reference containing gc objects as permanently marked - they will never be collected and never need to be walked again (or have their mark bit written to).
This solution also makes GC more efficient for an upfront cost, as we can skip all these objects in future collections. I'm unsure if there is an efficient way to compute the relevant object subgraphs. I'm not totally sure how well this trick would work and there may be some complicated details, but maybe worth more experiments.