Closed hannesm closed 5 years ago
I looked through the reverse dependencies on the impact of this change:
using -m plain
mode and only read
on the generated module (size
was removed in above commits):
the main interface change is for the -m lwt
mode, as used by the mirage utility (which emits version constraints of >=2.0.0 & <3.0.0
for all the unikernels).
I'm not sure to remember how this used to work, but it seems that now you are loading all the data in a mirage-kv in-memory store instead of doing this lazily ? Does it have a performance impact on startup time ?
@samoht this is correct. I was not aware that crunch did this interesting trick for sort of lazy loading.
I tried to measure a difference in the startup time by using the mirage-skeleton/kv_ro
example and extending the to-be-crunched store (subdirectory t
).
test 1 adding a single big (10MB) file
test 2 adding 1000 files (all with the contents "foo")
I'm curious what are the use cases for crunch(+mirage/lwt) you have in mind? I usually use it for a subdirectory with two files (server.pem and server.key). Crunch has its limit when looking into the size of the generated module (see https://github.com/mirage/mirage/issues/396), and is not well-suited for arbitrary amounts (in size or number) of files.
The underlying issue AFAICT is that the new kv API uses paths as keys, and requires a list
function -- earlier keys were strings, and we didn't need list functionality.
I can see multiple ways forward:
crunch
so that it generates a nested structure (keys are now paths) of dictionaries
-m plain
option API (or, if we would like to keep that one stable, we can write wrappers for that)list
as a filter on the file_list
(will be an expensive (O(number of files)
) operation if lots of files/nested directories are around)get
to convert the key to a string
and use file_chunks
exists
in a similar fashion as list
(will also be rather expensive (O(number of files)
))what do you think? //cc @linse
Given the low perf hit vs. gain in simplicity I think I am fine with this patch. We should probably add an mirage-kv-bench
at one point to improve this across all the kv implementations.
@samoht sounds good to me. we should as well write tests for the mirage-kv interface, and look that the different implementations behave as we expect them to behave :)
and use the new mirage-kv-mem package internally. //cc @samoht