implement the new KV interface

hannesm commented 5 years ago

and use the new mirage-kv-mem package internally. //cc @samoht

hannesm commented 5 years ago

I looked through the reverse dependencies on the impact of this change: using -m plain mode and only read on the generated module (size was removed in above commits):

datakit-ci
graphql-lwt/graphql-cohttp
iocaml
irmin-http/irmin
mirage-seal
pds
stone unclear (will upper bound these)
mirage
mirage-www

the main interface change is for the -m lwt mode, as used by the mirage utility (which emits version constraints of >=2.0.0 & <3.0.0 for all the unikernels).

samoht commented 5 years ago

I'm not sure to remember how this used to work, but it seems that now you are loading all the data in a mirage-kv in-memory store instead of doing this lazily ? Does it have a performance impact on startup time ?

hannesm commented 5 years ago

@samoht this is correct. I was not aware that crunch did this interesting trick for sort of lazy loading. I tried to measure a difference in the startup time by using the mirage-skeleton/kv_ro example and extending the to-be-crunched store (subdirectory t).

test 1 adding a single big (10MB) file

no difference on neither unix nor hvt

test 2 adding 1000 files (all with the contents "foo")

on unix, no measurable difference
old crunch and hvt: 0.000 user time 0.011 system time
new crunch and hvt: 0.000 user time 0.038 system time

I'm curious what are the use cases for crunch(+mirage/lwt) you have in mind? I usually use it for a subdirectory with two files (server.pem and server.key). Crunch has its limit when looking into the size of the generated module (see https://github.com/mirage/mirage/issues/396), and is not well-suited for arbitrary amounts (in size or number) of files.

The underlying issue AFAICT is that the new kv API uses paths as keys, and requires a list function -- earlier keys were strings, and we didn't need list functionality.

I can see multiple ways forward:

rewrite crunch so that it generates a nested structure (keys are now paths) of dictionaries
- this would disallow code sharing between mirage-kv-mem and the generated crunch output
- this would break the -m plain option API (or, if we would like to keep that one stable, we can write wrappers for that)
- at initialisation time the data structures would need to be allocated and filled (there's only literal syntax for lists and arrays, none for hashtables/maps))
rewrite this patch to not use mirage-kv-mem, and instead:
- implement list as a filter on the file_list (will be an expensive (O(number of files)) operation if lots of files/nested directories are around)
- implement get to convert the key to a string and use file_chunks
- implement exists in a similar fashion as list (will also be rather expensive (O(number of files)))
leave it as is, and improve later (once we have a concrete demand for improving its performance and know characteristics of access patterns).

what do you think? //cc @linse

samoht commented 5 years ago

Given the low perf hit vs. gain in simplicity I think I am fine with this patch. We should probably add an mirage-kv-bench at one point to improve this across all the kv implementations.

hannesm commented 5 years ago

@samoht sounds good to me. we should as well write tests for the mirage-kv interface, and look that the different implementations behave as we expect them to behave :)

mirage / ocaml-crunch

implement the new KV interface #47