Closed MisterDA closed 3 months ago
I would be happy to approve and merge this PR if the recommendations of Xavier are implemented:
The target systems we care about are 64-bit architecture with alignment constraints <= 16 and malloc returning 16-aligned blocks. In this case, the
data
part ofstruct pool_block
is naturally 16-aligned (because the two pointers before use 16 bytes), and nothing needs to be done. Aligningdata
usingmax_align_t
should have no effect. (*)For a 32-bit architecture with 16-byte alignment constraints and malloc returning 16-aligned blocks (e.g. Linux x86-32), aligning
data
to 16 seems preferable to me and can be achieved by usingmax_align_t
.For a 32-bit architecture with 16-byte alignment constraints and malloc returning 8-aligned blocks (perhaps Windows 32 bits, not sure): no amount of alignment constraints in
struct pool_block
will give 16-aligneddata
fields, so you could just as well put no alignment constraints.
(The footnote (*)
is about the magic number. Given that this is a debug-only feature, which results at worst in a bit of wasted space due to extra alignment, it seems reasonable to do the easy thing which is to leave it alone.)
I am not sure whether the current state of the PR corresponds to this explicit design choice. (I think it does, but in a non-obvious way.) @MisterDA, can you confirm? Could you maybe include a comment that explains the current design/intent, possibly by just quoting the description of Xavier above? (If you just quote, you could mark him as author of the corresponding git commit.)
I am not sure whether the current state of the PR corresponds to this explicit design choice. (I think it does, but in a non-obvious way.) @MisterDA, can you confirm?
I'm not sure either, I'll need some time to convince myself.
Could you maybe include a comment that explains the current design/intent, possibly by just quoting the description of Xavier above?
I'm thinking of turning the bullets points into some sort of static assertions, to be added to the code or the configure script.
Coming back to this PR:
max_align_t
is technically correct (and even if, on some compilers, max_align_t
is not good enough for SSE data, it is not worse than the existing union
).pool_block
get obtained with malloc
, it makes no sense to have pool_block
require a larger alignment than what malloc
provides.A somewhat simple way to address this would be to:
struct pool_block
altogetherSIZEOF_POOL_BLOCK
as a computation rounding sizeof(struct pool_block)
to the intended alignment (supposedly 16
on amd64 due to SSE, max_align_t
elsewhere).pool_block
using C11 aligned_alloc
, using the above decided alignment.data
field of struct pool_block
with pointer arithmetic using SIZEOF_POOL_BLOCK
above.This could be done either in this PR or in a forthcoming PR (since this one currently doesn't change anything about the actual alignment of these allocations).
I agree with @dustanddreams conclusions. I'd like to experiment with aligned_alloc
in a follow-up PR, if possible.
Unfortunately we cannot use aligned_alloc
as the blocks can be resized. This is currently done with realloc
, but there is no POSIX or C11 aligned realloc
. We cannot hand-roll our own realloc
, because the original size of the data is lost (unless we duplicate its size in struct pool_block
). Does that make sense?
Unfortunately we cannot use
aligned_alloc
as the blocks can be resized. This is currently done withrealloc
, but there is no POSIX or C11 alignedrealloc
. We cannot hand-roll our ownrealloc
, because the original size of the data is lost (unless we duplicate its size instruct pool_block
). Does that make sense?
That's unfortunate.
This means the (few) current users of caml_stat_resize
and caml_stat_resize_noexc
need to be inspected; if they do not require any particular alignment the use of realloc
is safe and a comment should be added to mention that it's ok; otherwise we'll have to roll our own realloc
-with-alignment flavour. (and yes, this involves figuring out what the original allocation size was - glibc has malloc_usable_size
but that's a non-portable interface)
I've added more code that uses C11 aligned_alloc
, free
or Microsoft _aligned_malloc
, _aligned_free
, _aligned_realloc
.
The default alignment of struct pool_block
and its data
field is now the alignment of max_align_t
in the general case, or 16 on amd64 if alignof(max_align_t)
is smaller than 16.
As the libc doesn't provide an aligned realloc
(Microsoft does), this needs a little custom realloc
implementation in caml_stat_resize
to keep the buffer aligned. We've considered using malloc_usable_size
or Apple's malloc_size
, but they're non-portable and intended towards telemetry rather than custom realloc
, so we need to keep track of the requested alloc size, and copy the data manually.
@xavierleroy May I ask for another review? I hope to have addressed your concerns by switching to aligned_alloc
.
I had a quick look at the current code, so maybe I missed something.
I think it's a good idea to use _aligned_alloc
for Win32, because it addresses both a Win32-specific problem (malloc
result possibly insufficiently aligned) and a MSVC-specific problem (max_align_t
not defined).
For all other platforms, I'd rather keep your original code, the one that uses plain malloc
and max_align_t
, under the assumption that all other platforms do the right thing, namely:
max_align_t
is the biggest alignment enforced by the processor, andmalloc
returns max_align_t
-aligned blocks.Relatedly, your original code avoids the need for storing the size in the block header, something I'm not a fan of because it increases memory usage.
Thanks for the review. I've kept aligned malloc
/realloc
/free
on Windows, and regular malloc
/realloc
/free
on other platforms, this removed the need to keep track of the allocation size.
Some cleanups removing checks and workarounds for older compilers, assuming that the compiler supports C11 or C++11 out of the box. We may use
_Alignas
(since C11) oralignas
(since C23) directly, and use themax_align_t
type. Unfortunately, support formax_align_t
is missing from the Windows C standard library.