scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.37k stars 1.55k forks source link

Seastar's `posix_memalign` may return unaligned memory #1940

Open dotnwat opened 12 months ago

dotnwat commented 12 months ago

System posix_memalign allows allocation sizes that are less than the requested alignment. However, with Seastar's implementation the allocation is performed without error, and the resulting allocation may not be aligned.

For users that are using posix_memalign to allocate memory for DMA this inconsistency is unlikely to ever be encountered, but as more and more non-Seastar code (e.g. JIT) is allowed to run on reactor with the Seastar allocator implementation the more likely it is to encounter code which assumes full compliance.

Here is a reproducer for the issue. Works in debug builds with system allocator, fails in release builds with Seastar allocator.

diff --git a/tests/unit/alloc_test.cc b/tests/unit/alloc_test.cc
index 847ccec1..5e7ebb86 100644
--- a/tests/unit/alloc_test.cc
+++ b/tests/unit/alloc_test.cc
@@ -104,6 +104,40 @@ SEASTAR_TEST_CASE(test_temporary_buffer_aligned) {
     return make_ready_future<>();
 }

+SEASTAR_TEST_CASE(test_posix_memalign) {
+    auto verify = [](size_t alignment, size_t size) {
+        void *p = NULL;
+        int result = posix_memalign(&p, alignment, size);
+        BOOST_REQUIRE(result == 0);
+        BOOST_REQUIRE(p != nullptr);
+        BOOST_REQUIRE_MESSAGE(
+            (reinterpret_cast<uintptr_t>(p) % alignment) == 0,
+            fmt::format("failed with p {} alignment {} size {}",
+                        fmt::ptr(p), alignment, size));
+               free(p);
+    };
+
+    // posix_memalign does not place a restriction on the allocated size, unlike
+    // aligned_alloc which requires size to be a multiple of alignment. at the
+    // time this test was written verify(32,16) does not fail without the
+    // proceeding allocation.
+    //
+    // this case can be seen in both the glibc test suite:
+    //   https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-posix_memalign.c;h=5039b6501710a550e71f1bc659b1b5c0baa03cf3;hb=HEAD#l109
+    //
+    // and apple's libmalloc test suite:
+    //   https://opensource.apple.com/source/libmalloc/libmalloc-283.60.1/tests/posix_memalign_test.c.auto.html
+    //
+    // at the time the test is written it will fail on release builds with
+    // seastar allocator and pass when using the default allocator in debug
+    // mode.
+
+    verify(16, 32);
+    verify(32, 16);
+
+    return make_ready_future<>();
+}
+
nyh commented 12 months ago

Good catch. Almost 10 years ago we had the same bug in the posix_memalign() implementation in the OSv project, and fixed it in https://github.com/cloudius-systems/osv/commit/c8845bb544b37438622bf2e7b8bcd7c7874dda93. Our workaround there did something simple - it just increased the requested size to match the alignment. Maybe we should do the same here, unless you can think of a better way.

dotnwat commented 12 months ago

it just increased the requested size to match the alignment.

I did the same thing, but had the luxury of noticing it and controlling the call sites, so adding that strategy to Seastar could work if the wasted space wasn't a concern. I don't know enough about how the allocator works to suggest something more precise.

nyh commented 12 months ago

Yes, I meant the solution in that 10-year-old patch was indeed to increase the requested size inside posix_memalign() - not in all its callers.

I know it looks like it wastes space, but I'm not sure we have a better solution - imagine that you ask for an allocation of size 16 aligned at 32 bytes. If we take a 32-byte-aligned chunk and pretend to have allocated only the first 16 bytes of it, that would be fine, but what will we do with the other 16-byte half? How will we remember we haven't allocated this yet? I guess it's possible to build a linked list of such half-allocated chunks, but I wonder if it's worth the effort (in my experience this use case is very rare - is it common in your application? If it's rare, it must be done correctly but utmost efficiency is less important). Maybe @avikivity will have a better idea.

travisdowns commented 12 months ago

Although I can imagine approaches that don't waste the space, the waste approach seems perfect here because it's only happening in the unusual case that someone actually asks for alignment > size, which we believe must be quite rare since no-one has complained of it not working in almost a decade.