Open EmeraldShift opened 3 years ago
Thanks for the PR and sorry for the lack of replies so far. I'll start having a better look sometime this week and will return with more comments.
Hi, after having a better look at this and thinking some more about it, it looks like a good opportunity to introduce a more forward looking and extensible Options
(actual name TBD) abstraction that we'll be able to use as additional settings appear in the future. For example, how about introducing a struct (let's just call it Options
for now) that holds both file_offset
and page_size_policy
(all the current non-mandatory memory region parameters, instead of just the page size policy). Then, we change the bounds of A
from GuestMemoryMmap::from_ranges_with_options
to be A: Borrow<(GuestAddress, usize, Options)>
.
It now becomes simpler to implement more configuration possibilities by adding more fields to Options
. We can further avoid breaking changes by, for example, keeping the current signature of build
from mmap_unix
and similarly introducing a new method that is more configurable. Not sure precisely what its signature should be so we can harmonize it with the existing build
and build_raw
with minimal additions, but we can def find something if we spend a bit more time here. What do you think about the high level approach?
I like the high level idea of a compact Options
struct to consolidate optional arguments when mapping guest memory. The approach feels much more easily extensible than having to add new functions with new options. A few thoughts:
1) Backward-compatibility of the interface <=> backward-compatibility of the Options
struct. Renaming fields will break user code, and users will have to be careful to instantiate options as Options { .file_offset = ..., ..Options::default() };
or some similar construct to ensure that adding new fields to Options
does not break existing user code. Maybe we could write a macro constructor with default arguments.
2) Questions arise about how high/low in the interface these Options
should live. An mmap_unix
-specific set of options would allow a lot more customizability to guest mappings on *nix hosts, likely similar for Windows. A generic mmap
set of options would need to either miss some OS-specific mapping feature, or contain options for them anyway, bloating the struct. A similar story occurs between mmap
and guest_memory
. For example, Firecracker seems to exclusively use GuestMemoryMmap
, ignoring the higher-level abstractions. I think Options
' most likely home would be at the mmap
level, containing the intersection of useful features from the different OSes (or maybe an OS-specific option when it's really useful).
This PR adds a
PageSizePolicy
enum which can be used to configure the behavior ofMmapRegion
instances upon creation. The current options are:BasePages
, which uses the defaultmmap
behaviorTransparentHugepages
, which uses THP viamadvise
ExplicitHugepages
, which passesMAP_HUGETLB
tommap
The page size policy is at a per-mapping granularity, so a given VM can have different policies for different mappings, if desired. A helper function,
from_ranges_with_policy
, allows the VMM to configure mapping policies when constructing mappings from address ranges.The behavior is stubbed out on the
mmap_windows.rs
implementation, but the arguments are received there to allow the file to compile.The intention of this PR is to implement the functionality required for hugepage support in vm-memory (for #118), but without invasively modifying the external API, pending further discussion. This is mostly accomplished by adding extra constructors/functions which take a PageSizePolicy as an argument, but leaving existing functions alone.
These changes were experimentally tested for Firecracker by modifying Firecracker's local fork of vm-memory to include similar changes. Empirically, we were able to boot an Alpine microvm up to ~37% faster, and an Ubuntu microvm up to ~30% faster, reduce on-boot page faults by ~70×, all with negligible extra memory overhead. We would like to implement these changes here, so that other VMMs derived from vm-memory can hopefully reap the same benefit.
A few design questions we're not too sure about:
What is the correct way to expose this configuration to VMMs? The way we've chosen is to add a special
from_ranges
function inmmap.rs
that accepts a policy per tuple. However, as of recently, Firecracker is using a slightly modified local fork of vm-memory to implement dirty bit tracking, in addition to this repo. The local fork complicates how (and where) classes are used/defined (for example, it overridesmmap.rs
), making it difficult to utilize these changes downstream. We're not sure how other downstreams of this repo use the APIs, either. Advice to improve the API is much appreciated.At which level of abstraction should this policy belong? We decided to place the enum in
mmap.rs
since paging policy is closely related to mmapped regions, and not so closely related to GuestMemory in general. However, by attaching a policy to all MmapRegions, we're requiring a Windows implementation (which, for now, is just a no-op). Would it make sense to implement anExplicitHugepages
policy for Windows, if the OS provides that kind of control?