ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.47k stars 1.1k forks source link

Simplify `CAMLalign` and improve alignment of `caml_stat_block` with C11 `max_align_t` and `aligned_alloc` #13139

Closed MisterDA closed 3 months ago

MisterDA commented 6 months ago

Some cleanups removing checks and workarounds for older compilers, assuming that the compiler supports C11 or C++11 out of the box. We may use _Alignas (since C11) or alignas (since C23) directly, and use the max_align_t type. Unfortunately, support for max_align_t is missing from the Windows C standard library.

gasche commented 4 months ago

I would be happy to approve and merge this PR if the recommendations of Xavier are implemented:

  • The target systems we care about are 64-bit architecture with alignment constraints <= 16 and malloc returning 16-aligned blocks. In this case, the data part of struct pool_block is naturally 16-aligned (because the two pointers before use 16 bytes), and nothing needs to be done. Aligning data using max_align_t should have no effect. (*)

  • For a 32-bit architecture with 16-byte alignment constraints and malloc returning 16-aligned blocks (e.g. Linux x86-32), aligning data to 16 seems preferable to me and can be achieved by using max_align_t.

  • For a 32-bit architecture with 16-byte alignment constraints and malloc returning 8-aligned blocks (perhaps Windows 32 bits, not sure): no amount of alignment constraints in struct pool_block will give 16-aligned data fields, so you could just as well put no alignment constraints.

(The footnote (*) is about the magic number. Given that this is a debug-only feature, which results at worst in a bit of wasted space due to extra alignment, it seems reasonable to do the easy thing which is to leave it alone.)

I am not sure whether the current state of the PR corresponds to this explicit design choice. (I think it does, but in a non-obvious way.) @MisterDA, can you confirm? Could you maybe include a comment that explains the current design/intent, possibly by just quoting the description of Xavier above? (If you just quote, you could mark him as author of the corresponding git commit.)

MisterDA commented 4 months ago

I am not sure whether the current state of the PR corresponds to this explicit design choice. (I think it does, but in a non-obvious way.) @MisterDA, can you confirm?

I'm not sure either, I'll need some time to convince myself.

Could you maybe include a comment that explains the current design/intent, possibly by just quoting the description of Xavier above?

I'm thinking of turning the bullets points into some sort of static assertions, to be added to the code or the configure script.

dustanddreams commented 4 months ago

Coming back to this PR:

A somewhat simple way to address this would be to:

This could be done either in this PR or in a forthcoming PR (since this one currently doesn't change anything about the actual alignment of these allocations).

MisterDA commented 4 months ago

I agree with @dustanddreams conclusions. I'd like to experiment with aligned_alloc in a follow-up PR, if possible.

MisterDA commented 4 months ago

Unfortunately we cannot use aligned_alloc as the blocks can be resized. This is currently done with realloc, but there is no POSIX or C11 aligned realloc. We cannot hand-roll our own realloc, because the original size of the data is lost (unless we duplicate its size in struct pool_block). Does that make sense?

dustanddreams commented 4 months ago

Unfortunately we cannot use aligned_alloc as the blocks can be resized. This is currently done with realloc, but there is no POSIX or C11 aligned realloc. We cannot hand-roll our own realloc, because the original size of the data is lost (unless we duplicate its size in struct pool_block). Does that make sense?

That's unfortunate.

This means the (few) current users of caml_stat_resize and caml_stat_resize_noexc need to be inspected; if they do not require any particular alignment the use of realloc is safe and a comment should be added to mention that it's ok; otherwise we'll have to roll our own realloc-with-alignment flavour. (and yes, this involves figuring out what the original allocation size was - glibc has malloc_usable_size but that's a non-portable interface)

MisterDA commented 3 months ago

I've added more code that uses C11 aligned_alloc, free or Microsoft _aligned_malloc, _aligned_free, _aligned_realloc. The default alignment of struct pool_block and its data field is now the alignment of max_align_t in the general case, or 16 on amd64 if alignof(max_align_t) is smaller than 16. As the libc doesn't provide an aligned realloc (Microsoft does), this needs a little custom realloc implementation in caml_stat_resize to keep the buffer aligned. We've considered using malloc_usable_size or Apple's malloc_size, but they're non-portable and intended towards telemetry rather than custom realloc, so we need to keep track of the requested alloc size, and copy the data manually.

MisterDA commented 3 months ago

@xavierleroy May I ask for another review? I hope to have addressed your concerns by switching to aligned_alloc.

xavierleroy commented 3 months ago

I had a quick look at the current code, so maybe I missed something.

I think it's a good idea to use _aligned_alloc for Win32, because it addresses both a Win32-specific problem (malloc result possibly insufficiently aligned) and a MSVC-specific problem (max_align_t not defined).

For all other platforms, I'd rather keep your original code, the one that uses plain malloc and max_align_t, under the assumption that all other platforms do the right thing, namely:

Relatedly, your original code avoids the need for storing the size in the block header, something I'm not a fan of because it increases memory usage.

MisterDA commented 3 months ago

Thanks for the review. I've kept aligned malloc/realloc/free on Windows, and regular malloc/realloc/free on other platforms, this removed the need to keep track of the allocation size.