koka-lang / koka

Koka language compiler and interpreter
http://koka-lang.org
Other
3.16k stars 151 forks source link

[Feature Proposal]Relocatable Sharable Memory (Zero-Copy Structural Data Packs) as an Effect #543

Open complyue opened 1 month ago

complyue commented 1 month ago

I see heap::H at type level has been successfully used in Koka for scoping checks for soundness and effect isolation.

For some time I've been seeking a solution to build and share data packs via mmap'able files, where arbitrarily complex data structures can be populated via the mmap'ed address space by a builder process, unmap/freeze the file, and subsequently other processes can (usually readonly) mmap the data file, then use the data graph with zero-copy performance, as well as on-demand-loading / out-of-core-processing for free.

This is quite feasible with specialized coding for data manipulation (e.g. Google's Protocol Buffers), but what I really want is language/runtime level integration to simplify the workflow so it would appear barely different from vanilla RAM usage to the end programmers.

I would tentatively call such a data pack file a shilo - "Sharable Silo", so "Data Silo"s in this way can be a good thing rather than bad.

Code-generation has to be customized in that pointers in a shilo, shall be valued relative to its own residental address, and derefing such a pointer would need one more ptr-add op. Once done, the file data can be mmap'ed to any RAM address, we get relocation very cheap (1 extra int addition per deref, w/o cache burden).

It looks very promising if based on what Koka has done wrt effective, scoped resource tracking. There seems merely a value-level handle (i.e. the mmap resource) to be managed as effects, and the rest almost the same implemention-wise as samples/handlers/named/heap.kk.

I'd like to work it out, maybe PRs are necessary to facilitate relative-pointer assignment/deref, while other parts may come as a 3rd party lib, or maybe you'd like it go into std lib?

But I'd like to first hear your thoughts wrt this feature, difficulties I might have missed, and other opinions, advices?

Off the top of my head, a shilo may need a top-level entry registry, maybe a heterogeneous dict, with string keys and various types of values appear on the surface. Currently it seems a lacking, and only if possible, it may need to store individual "evidence" of types (like a generic Haskell value with existential quantified type which satisfying some constraint), per value basis? But shilos should not store function pointers for relocation purpose, up to more thinking about it.


Btw, I see std/data/* are mostly in a "Todo" state, and implementations under v1/ are not compatible with 3.x series, right?