vmware / splinterdb

High Performance Embedded Key-Value Store
https://splinterdb.org
Apache License 2.0
680 stars 57 forks source link

Core changes to support running Splinter with allocated shared memory. #567

Closed gapisback closed 1 year ago

gapisback commented 1 year ago

Support to run SplinterDB with shared memory configured for most memory allocation is an -EXPERIMENTAL- feature added with this commit.

This commit brings in basic support to create a shared memory segment and to redirect all memory allocation primitives to shared memory. Currently, we only support a simplistic memory mgmt; i.e. only-allocs, and no frees. With shared segments of 1-2 GiB we can run many functional and unit tests.

The high-points of the changes are:

netlify[bot] commented 1 year ago

Deploy Preview for splinterdb canceled.

Name Link
Latest commit 2c4e5331ae054e0a6c34bab15e7804c78f78548f
Latest deploy log https://app.netlify.com/sites/splinterdb/deploys/64ee8f331924190008a61e11
gapisback commented 1 year ago

Note to the reviewers: @rtjohnso @ajhconway @rosenhouse :

The entire shared-memory support is being parceled into 3-part change-sets. This is the 1st of the whole work.

  1. Core shared memory support
  2. Support for multi-process, forked-processes executing with shared memory
  3. Support for free-fragment recycling, improved shared memory management.

I am working on putting together the PRs for (2) and (3).

Here is the order to review files in, to get a good grip on this change-set:

  1. platform.h, platform.c
  2. shmem.h, shmem.c: platform_shmcreate(), platform_shmdestroy(), platform_shm_alloc(), platform_shm_free(),
  3. Misc other header files
  4. Review test.sh to gauge the scope of changed and enhanced testing brought-in with this work.
gapisback commented 1 year ago

HI, @rtjohnso - I finally completed the work to address all (most) of your review comments. And have also rebased v/s latest /main.

Here's the overall status of this update cycle:

  1. Couple of rebases v/s /main went thru smoothly; i.e. there were no conflicts, so I don't think there was any major code churn due to this work. There was one small merge-issue in trunk.c which I found and fixed during test cycles.
  2. Removed most usages of platform_heap_handle from shmem.c and from many external interfaces. There are some vestigial references that need to be cleansed; in future.
  3. splinterdb_create() const issue has been fixed. New test added to exercise gymnastics.
  4. writable_buffer_resize() interface change corrected
  5. Other minor cleanup and a few bugs that you spotted

Few things I didn't take-on and do ... as they will simply balloon this change-set. And are not super-critical (I think). Some examples:

  1. Merging parsing of --use-shmem in config_parse() and only use that interface to parse this argument.
  2. Some other code cleanup ... that I don't recall now.

I expect CI runs to complete ... there IS a lot of new testing going on, so won't be surprised if there are still some failures. I'll watch and stabilize it.

This updated PR is now ready for a final review. Hopefully we can close out loose threads and try to land this large'ish piece of work.

gapisback commented 1 year ago

@rtjohnso - I have addressed the last remaining comment that we agreed, during our discussion yesterday, needs to be fixed.

Please see this commit, and let me know if this version of the fix is less unacceptable to you.

The new unit-test case added in this commit tripped up in a CI-job, in non-shared-memory execution. I patched an assertion under this commit.

I think it's fine to have this white-box type of unit-test case to clearly understand the gyrations of shared memory allocation. If you strongly object to this tweaking, I can just delete this specific assertion check in the affected test-case, unit/writable_buffer_test.c, but that would not be my 1st choice:

419    // Currently, reallocation from shared-memory will not reuse the existing
420    // memory fragment, even if there is some room in it for the append. (That's
421    // an optimization which needs additional memory fragment-size info which
422    // is currently not available to the allocator.)
423    if (data->use_shmem) {
424       const void *new_data2 = writable_buffer_data(wb);
425       ASSERT_TRUE((new_data != new_data2));

Rest -- ball is in your court. Awaiting to hear of any other changes you'd like to see or other things you'd like to discuss. Thanks.

gapisback commented 1 year ago

@rtjohnso - I've addressed your last batch of comments that needed code changes.

Only a couple to do with config-parsing were not addressed --I've left my reasons in the responses at the appropriate file.

The one thing remaining is: What would you like the random value to define SPLINTERDB_SHMEM_MAGIC to be?

I could not think of anything better than what's already in the code. In any case, it's only for diagnostics, so does it really matter that the value defined be random, more than what it's currently defined to?

gapisback commented 1 year ago

Summary comments:

gapisback commented 1 year ago

HI, @rtjohnso -- I investigated the issues around padding for cache-alignment that you raised in this review comment and in this comment.

The affected code chunk in /main is this one, and has been in the code line since the very initial commit created to move this code base into GitHub.

403 static inline void *
404 platform_aligned_malloc(const platform_heap_id UNUSED_PARAM(heap_id),
405                         const size_t           alignment, // IN
418    const size_t padding = (alignment - (size % alignment)) % alignment;
419    return aligned_alloc(alignment, size + padding);

I looked at the semantics of aligned_alloc() and here is what I found in Gnu docs Aligned-Memory-Blocks.html:

The aligned_alloc function allocates a block of size bytes whose address is a multiple of alignment. The alignment must be a power of two and size must be a multiple of alignment. 

Linux man page also documents something similar:

The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment. 

I think this answers a question you raised in a previous comment; i.e. "Alignment is a request on the position of the start of the returned allocation. Why would we need to pad the the size?"

Based on what the Gnu and Linux doc state, I don't think it will be right (as you suggested in our discussion) to delete the padding code as it exists today for the non-shared allocation interface.

Technically, I could not apply this padding-logic in platform_aligned_malloc() for the shared-memory allocator and try to handle the alignment inside platform_shm_alloc(). But, I do think it's a reasonable choice to keep both the schemes symmetric in this regard.

Given that this is the expected behaviour of aligned_alloc(), do you still see the need to rework shared-memory based allocator, which currently also inherits this padding-up of requested size to required bytes?


In any case, I did try to actually re-work platform_shm_alloc() to receive an alignment argument and then to do the alignment of shm->shm_next inside platform_shm_alloc(). It will need a bit of pointer-arithmetic / bit-masking and boundary condition handling to get this logic right.

But before embarking on finalizing that code rework, I wanted to bring this semantics issue vis-a-vis aligned_alloc() to see if you have a re-think.

As mentioned earlier, having to rework this now will make it a bit more tricky to merge with the upcoming memfrag{}-based memory allocation / free interfaces (in part-3).

If you still do think that this padding business must be reworked for shared-memory based allocation, let me suggest that we revisit that once part-3 lands, at which time this whole memory alloc / free picture will become clearer.

--

Let me know your thoughts based on this new information.

gapisback commented 1 year ago

HI, @rtjohnso -- Barring the one remaining issue of how-to handle the business of requests for aligned memory (see previous response of mine), here is the status of the remaining rework based on our discussion this past week.

All items you raised as needing changes have been addressed, with either code-rework, test-changes or extended enablement of test executions via test.sh.

Main changes arising in last round of updates, to address your review comments summarized in this entry:

test.sh has a new driver function, run_tests_with_shared_memory(), which calls these sub-tests executed with --use-shmem option.

[1] Unit Tests : unit_test "--use-shmem", and run_slower_unit_tests "--use-shmem" invokes these unit-tests.

cc-sdb-vm:[69] $ build/release/bin/unit_test --list

List of test suites that can be run with shared-memory configured:

  task_system                   - Yes
  splinterdb_stress             - Yes
  splinterdb_quick              - Yes
  splinterdb_heap_id_mgmt       - Yes
  splinter_shmem                - Yes
  limitations                   - Yes
  btree                         - Yes
  btree_stress                  - Yes
  writable_buffer               - Yes
  util                          - No
  platform_api                  - No
  misc                          - No
  splinter                      - Yes

 - unit/config_parse_test.c : Not relevant to feature
 - unit/misc_test.c : Not relevant to feature
 - unit/platform_apis_test.c: Does not add value to convert this test;
 - unit/util_test.c: Not relevant to feature
                               platform_heap_create() with use_shmem is
                               tested in many other tests.

[2] Functional tests that can be run with shared-memory configured

See calls to run_splinter_perf_tests(), run_btree_tests() and run_other_driver_tests() in test.sh

 - splinter_test : Yes; Both --functionality and --perf
 - btree_test    : Yes; Both basic and --perf
 - cache_test   : Yes
 - log_test      : Yes
 - filter_test   : Yes