udem-dlteam / ribbit

A small and portable Scheme implementation with AOT and incremental compilers that fits in 4K. It supports closures, tail calls, first-class continuations and a REPL.
BSD 3-Clause "New" or "Revised" License
440 stars 43 forks source link

ideas: persistent heap or something else? #64

Open ec1oud opened 1 month ago

ec1oud commented 1 month ago

The idea to make the bytecode portable sounds good to me. But how portable? I think it's ok to use little-endian representation by default and be limited to those platforms for now (and maybe add a less-efficient implementation for big-endian machines later). (assuming the bytecode is binary... it looks like it's not though.) I'd like "portable" to include Plan 9, but then there are difficulties like not having posix system calls, and if you generate C code it's not likely to compile without some translation overhead (ape, npe or homegrown macros; and even then, the compiler balks at Cheney-style functions, as I found out). So it should be an advantage of bytecode, that only the rvm really needs porting; and if it's so simple, that shouldn't be hard in theory. After all the Java experience, I got kindof turned against VMs and bytecode, and I do keep wondering why they keep needing to be reinvented, but ok... I agree that bytecode portability is easier than writing a bunch of compilers or trying to work with a diverse set of existing compilers. The fact that there are so many implementations of rvm already is impressive.

Smalltalk was mentioned once in the paper. But Smalltalk has some sort of persistent sandbox. And no pointers, only references; what's the implementation? just an offset from the beginning of the sandbox?

Lisp data structures typically use too many pointers (unless you prefer vectors and records instead of linked lists). Most machines have MMUs; can it end up doing the translation from a file offset to a pointer? So is there any hope for being able to suspend, resume and maybe even migrate processes, along with their data, not just the bytecode? Maybe even between machine architectures? IMO that's the frontier for portability. If you give up on having multiple architectures, OpenMosix used to work ok (some things should be done in the OS rather than in a language: is this one of them?), and there are a couple other solutions like that (mostly unmaintained, it seems). But if you are going with byte code anyway, the code is already portable. But I think it's important to start with portable data. Code is data, after all. And one place to start is just do whatever Smalltalk does. But then in some situations maybe it gets to be a bummer if all the data is in one big sandbox, and you can't start up without it. Maybe it's ok for small individual processes with small sandboxes (and use CSP to build bigger software systems), or if a bigger sandbox is accessed in chunks on-demand. mmap does that anyway, but presumes that the whole file is present. And with any heap, it's extra work to keep it cache-friendly by keeping linked data together. Isn't that a big disadvantage to lisps now? Fragmented mmap'd data would be even worse than fragmented memory.

Another idea is to stop using pointers in persistent data. But what could be used instead? Some sort of hash that is small enough? Well if you replace pointers with hashes, either it's only really suitable for immutable data, or you need more indirection (hash tables, tree-based indexing or whatever). UUIDs might be an alternative, but looking up a value costs more than following a pointer. In either case, 128 bits is a more suitable size (and then maybe take a few bits away for the typical type tagging). Maybe a cons cell has to take 32 bytes if both the car and cdr can potentially contain something twice as big as a pointer. So it can't possibly be as efficient, right? But ribs are already bigger than the smallest-possible cons cell. You are trading more memory for more speed, right? Maybe a compromise could be that a rib could contain one 128-bit hash or uuid and one pointer: so linked lists have to remain volatile, but at least they can point to persistent records, vectors and such.

I really want to try something like this. I want to reduce the need for serialization. Some day we will have unified memory (and already if you mmap from a file on an SSD, maybe that looks something like what the future will look like). Another inspiration is things like IPFS/IPLD: use hashes instead of pointers, and then you can have content-addressing rather than having to know where the data lives (but their hashes are too big; and if you don't know where the data lives, probably you end up doing a lot of serialization, or at least network streaming of persistent chunks). Maybe there just needs to be a specifically persistent sandbox for certain kinds of data structures, while the rest can continue to be as volatile as they always were. Functional-style Scheme generates lots of garbage; whereas if your data structures are mmap'd, you'd probably rather mutate them in-place and minimize writes. On the other hand, with functional style, it ought to be fine to replace pointers with hashes: typically you only reuse some list tails and rebuild the list up to the head anyway. set-cdr! would either be impossible or have terrible efficiency though.

Once you have mmap-able persistent portable structures, there are at least a couple of approaches to network transparency that get easier, and that's one reason that I think about this.

I suppose the off-the-shelf solution is to just pick a disk-based KV store, write or find a Scheme API and use it. But those tend to be big and less-portable and faddish; language-integrated persistence seems like it would be nicer. It seems to me that the main thing is to decide what to replace pointers with, and make that mechanism as fast as possible in the VM.

leo-ard commented 2 weeks ago

Hey ! Thanks for the feedback and interest in Ribbit. If I understand, you are exploring the idea of portability of data and code in the context of Ribbit.

As you mention, Ribbit's bytecode is not really binary, and it is not directly portable from one RVM to the other. The bytecode is a serialization of a graph-like structure composed exclusively of Ribs. Ribs are a 3-field structure that represents the whole program, including data, stack and code. Ribs can be a scheme value, some RVM instructions or even contains some RVM-specific information. When an RVM run a program, it deserializes the bytecode into the rib-graph structure and "executes" it.

To make Ribbit portable, the question should be how to exchange this graph-structure from one RVM to the other. One obvious way of doing it is to serialize and deserialize it, but maybe other ways can be explored ?

Right now, we are using direct pointers inside the ribs, maybe using an addition indirection could help, but I feel like this would be a big overhead as Ribbit heavily relies on these indirections. For example, each time an instruction is executed, an indirection is used to fetch the next instruction (the third field of the rib). Using an indirection or hash for only RVM-specific data (such as a vectors or file descriptors) could be used to ease the transfer of this information between RVMs. However, another difficulty of porting the graph structure to another RVM is that it is specialized to the source code. For example, the index of the primitives changes depending on their usage to minimize the size of the resulting serialization. Because we need to encode the primitive's index each time the source code calls it, if we use a smaller primitive index for primitives that are widely used, it results in a smaller serialization. This is just one example of specialization, there are other ones as well. This makes it more challenging to port this graph-like structure as we need to either sync both RVM's to match their internal primitive table or we need to standardize at the cost of bigger graph serialization.

I don't know what the best strategy is to share code between RVMs, although I feel like Ribbit would be well suited for this task.

ec1oud commented 2 weeks ago

I'd have thought that bytecode portability would be a goal, eventually.

And having to follow a pointer each time to fetch the next instruction sounds like the typical linked-list bottleneck, and replacing pointers with hashes or something like that would make it worse. So I've had an idea for a long time to either implement a Scheme that uses vectors as its primary data type, or some sort of hybrid, like linked lists of bigger nodes (car carriers, let's call them). Then if each non-inline entry is a hash or whatever, it's more affordable, because at least the list itself is contiguous and cache-friendly. So I'm trying to do that, will see what goes wrong. But surely somebody else thought of that, ages ago...

I think this is the crux of why Lisps are seen as an academic-only solution. I remember the ads for PCs back in the 80's when I bought my first one: "zero wait-state" was a popular bullet point. (Mine had that, at 10 MHz.) That hasn't been possible for a long time now. When memory was as fast as the CPU, presumably chasing pointers all over RAM was just as cheap as any other kind of memory access (although spending limited memory to store pointers could have been a concern; and I didn't know about linked lists anyway back then). But now the key to performance is to keep the data contiguous in the order that you need to access it; therefore linked lists are mostly obsolete IMO, except when you really need extreme flexibility for rearranging the list. In practice, I never used set-cdr! much anyway; so I tend to think most list access is done in-order in most programs. Prepending had better be cheap, but that's still possible to optimize even with vectors.