Open qinsoon opened 1 year ago
As we use portable_atomic
for 128 bits atomics, they list the support for different platforms: https://github.com/taiki-e/portable-atomic/blob/HEAD/src/imp/atomic128/README.md
LDXP/STXP
loop from ARMv8 (2011)CASP
(DWCAS) added as FEAT_LSE (mandatory from armv8.1-a) (2014)LDP/STP
(DW load/store) if FEAT_LSE2 (optional from armv8.2-a, mandatory from armv8.4-a) (2017)LDIAPP/STILP
(DW acquire-load/release-store) added as FEAT_LRCPC3 (optional from armv8.9-a/armv9.4-a)LDCLRP/LDSETP/SWPP
(DW RMW) added as FEAT_LSE128 (optional from armv9.4-a)For x86_64 and aarch64, we should be fine. On platforms that use fallback implementation, we will have a performance issue.
I confirmed using cross-compilation toolchain that on aarch64-unknown-linux-gnu, AtomicU128
is using an LL/SC loop (LDXP, STXP and CBNZ) to implement both atomic load and atomic store for AtomicU128
. (If it is load, it stores the same value back and see if it succeeds; if it is a stroe, it stores a different value.) This is bad because it effectively turns every load into a load-store pair.
On riscv64gc-unknown-linux-gnu, AtomicU128 is not lock-free.
If we would like to provide efficient implementation for platforms that do not support 128 bits atomics, we should consider not using
&dyn SFT
. We need a way to represent an SFT entry as a normal pointer.
I think there are many possibilities.
For the generous heap layout where each space occupies a dedicated 2TB region, we can eagerly populate the SFTSpaceMap. From then on, we only need read-only access for SFT entries.
For SFTDenseChunkMap, only the first level map (from chunk address to space index) needs to be atomic. The second level is a map from space index to &dyn SFT
. The second level can be eagerly populated and only needs read-only access. Because we can't have too many spaces, the first level only needs one byte per chunk, and there is no problem making that atomic.
For SFTSparseChunkMap, each chunk maps directly to a &dyn SFT
. But if every sft_map entry can only change from NULL to a non-NULL &dyn SFT
, we can split a &dyn SFT
into two words, and load each of them atomically (that is, two 64-bit atomic loads, but not one 128-bit atomic load). If we see one of the two words being NULL, we repeat loading until we see both words are non-NULL. We only call methods of the &dyn SFT
after we got both non-NULL words, combine them, and transmute the combined 128-bit word into a &dyn SFT
.
Cross compiling without target-feature
probably will results in the most conservative code. But that's fair. If we produce a binary, we won't include a specific target feature. I am a bit surprised that portable_atomics
has runtime feature detection, but they did not implement atomic load using the detection on aarch64 linux (they use it for CAS). Maybe LDXP/STXP is cheaper than detection + LSE instructions?
You are right that we probably can refactor and do not really need atomic access to SFT for SFTSpaceMap
and SFTDenseChunkMap
. For SFTSparseChunkMap
, we are not changing from NULL to non-NULL, we mutate between an empty SFT (EMPTY_SPACE_SFT
) and a valid SFT. We cannot tell if it is for an empty space until we dereference it.
We currently store SFT entries as
&dyn SFT
which is a fat pointers. This is only efficient when the platform supports atomic instructions for double word size, i.e. 128 bits atomics on 64bits machines, and 64 bits atomics on 32 bits machines. We will give a warning in a debug build if the platform does not support double word atomics, in which case, a fall back locking mechanism is used to ensure the atomic access.If we would like to provide efficient implementation for platforms that do not support 128 bits atomics, we should consider not using
&dyn SFT
. We need a way to represent an SFT entry as a normal pointer.