microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.21k stars 1.51k forks source link

<memory>: Is this too late for changes to atomic_shared_ptr ? #748

Closed AlexGuteniev closed 4 years ago

AlexGuteniev commented 4 years ago

Is this too late for ABI changes for #601 ?

@BillyONeal wanted to add atomic waits there, and it could be really useful.

But for that to perform best need to avoid unnecessary notify_onecall when there's no contention. This can only be achieved by using one more bit, which affects ABI

AlexGuteniev commented 4 years ago

Actualy, entire enforcing spinlock with wait cannot be done, after it is released, so if not going to review #593 soon, then an alternative should be used there in atomic_shared_ptr.

BillyONeal commented 4 years ago

I think we might be OK because this is a std:c++latest feature, and things in latest aren't ABI locked. If we aren't, we'll need to either merge the wait change very quickly or merge a temporary change that disables atomic<shared_ptr>.

Thanks for poking on this; I knew it would be a problem but got distracted by vcpkg stuff....

AlexGuteniev commented 4 years ago

Third option would be to make simpler version of wait change. This simpler version can be based on parallel_algorithms.cpp stuff. You don't need to handle odd sized, misaligned, XP, etc, so parallel_algorithms.cpp thing would do. Except need to expose _one.

AlexGuteniev commented 4 years ago

I mean you just pass that byte from atomic_shared_ptr to __std_execution_wait_on_uchar and create __std_execution_wake_by_address_one exactly like __std_execution_wake_by_address_all, and do a proper spinlock.

By the way, is it guaranteed that only one copy of import libary used for old and new translation units in those scenarios where you don't want ABI break? (If not, contention table may be a problem in parallel_algorithms.cpp, conversion to satellite might help)

Apparently contention table in import library is a problem for atomic_shared_ptr, but not for parallel algorithms. The /MD or /MDd scenario with separately compiled binaries.

AlexGuteniev commented 4 years ago

I think the best option now is to speed up with #593 . Probably more concentrate on ABI, the .cpp can be fixed later if needed. Say, with current ABI I can switch to Semaphore fallbacks, if current fallbacks are wrong; but I could not if there wasn't "unwait" feature.

BillyONeal commented 4 years ago

After some discussion I think we are safe on ABI changes as the feature is guarded by /std:c++latest

AlexGuteniev commented 4 years ago

Maybe it worth to add some detect_mismatch in advance to prevent at least some consequences? As a user I would not expect that /std:latest can break compat. Just because I would not think at this level. Would just be satisfied to have the feature working, and ignore the warning about experimental, not even bothering suppress it.

AlexGuteniev commented 4 years ago

Actually I think I see way around when currently doing nothing more:

Then in future we have two bits in addition to low lock bit, so:

So if old implementation would lock, there's no atomic wait, new implementation could just spin forever.

If new implementation would lock, old implementation can keep spinning, and other new implementation can wait after setting third bit.

As an alternative to overaligning, can use only two bits, but then replace this: https://github.com/microsoft/STL/blob/852a3085906108045a3951064baaf4952f596e9e/stl/inc/memory#L3080 With 3

Then the schema is: 0 - not locked 3 - lock without atomic wait (can't wait and ask for notify) 1 - lock with atomic wait, not being waited on (no need notify) 2 - lock with atomic wait, being waited on (need notify)

@BillyONeal what do you think on this?

BillyONeal commented 4 years ago

Then in future we have two bits in addition to low lock bit

The alignment is only 4 so we only have 2 bits total. Note that reference count control blocks can come from existing shared_ptr instances so we can't special align them for atomic<shared_ptr<>>

Again though I don't think we have to worry about this. /std:c++latest == ABI unstable.

AlexGuteniev commented 4 years ago

Ok, I see only two bits has to be used. I still don't see a problem changing current atomic_shared_ptr to lock two bits at once (without any extra changes over it).

Then it could be at least potentially compatible with this: https://github.com/AlexGuteniev/STL/blob/61cb5b9a39d71f564d79dbfaeee7fd1ae38db464/stl/inc/memory#L3201-L3248

It is just about being a bit more friendly to /std:c++latest users. But sure if you are in the very last phase of release, probably you shouldn't change anything.

AlexGuteniev commented 4 years ago

Actually, even if doing nothing, waiting version can be added on top of current without ABI break. It is due to current uses low bit, and always set the other bit to zero. So waiting version can set low two bits to zero.

There's a question where to put "notify needed" bit, it can be resolved in two ways:

  1. Bad way, but still works. Don't put it anywhere, do always notify_one. It is not good as it impacts fast path.
  2. Better way to put it in any of used pointer bits. During being locked, the pointer is not used, it is restored after _Store_and_unlock. (My preference would be to put it into somewhere that is still rarely used by pointers, and preserve most of pointer value for debugging/visualizer purposes. Highest bit is used on /LARGEADDRESSAWARE on x86 only, 3rd from right is only for exotic allocator (usually there would be max_align_t))
BillyONeal commented 4 years ago

Again, I don't think any of that is worth the extra complexity. c++latest is not ABI stabilized.