Closed AlexGuteniev closed 4 years ago
Actualy, entire enforcing spinlock with wait cannot be done, after it is released, so if not going to review #593 soon, then an alternative should be used there in atomic_shared_ptr.
I think we might be OK because this is a std:c++latest feature, and things in latest aren't ABI locked. If we aren't, we'll need to either merge the wait change very quickly or merge a temporary change that disables atomic<shared_ptr
Thanks for poking on this; I knew it would be a problem but got distracted by vcpkg stuff....
Third option would be to make simpler version of wait change.
This simpler version can be based on parallel_algorithms.cpp
stuff.
You don't need to handle odd sized, misaligned, XP, etc, so parallel_algorithms.cpp
thing would do. Except need to expose _one
.
I mean you just pass that byte from atomic_shared_ptr
to __std_execution_wait_on_uchar
and create __std_execution_wake_by_address_one
exactly like __std_execution_wake_by_address_all
, and do a proper spinlock.
By the way, is it guaranteed that only one copy of import libary used for old and new translation units in those scenarios where you don't want ABI break? (If not, contention table may be a problem in parallel_algorithms.cpp
, conversion to satellite might help)
Apparently contention table in import library is a problem for atomic_shared_ptr
, but not for parallel algorithms. The /MD or /MDd scenario with separately compiled binaries.
I think the best option now is to speed up with #593 . Probably more concentrate on ABI, the .cpp can be fixed later if needed. Say, with current ABI I can switch to Semaphore fallbacks, if current fallbacks are wrong; but I could not if there wasn't "unwait" feature.
After some discussion I think we are safe on ABI changes as the feature is guarded by /std:c++latest
Maybe it worth to add some detect_mismatch
in advance to prevent at least some consequences?
As a user I would not expect that /std:latest
can break compat. Just because I would not think at this level. Would just be satisfied to have the feature working, and ignore the warning about experimental, not even bothering suppress it.
Actually I think I see way around when currently doing nothing more:
_Ref_count_base
to 8, so that there are three low bitsThen in future we have two bits in addition to low lock bit, so:
notify_one
" flag.So if old implementation would lock, there's no atomic wait, new implementation could just spin forever.
If new implementation would lock, old implementation can keep spinning, and other new implementation can wait after setting third bit.
As an alternative to overaligning, can use only two bits, but then replace this: https://github.com/microsoft/STL/blob/852a3085906108045a3951064baaf4952f596e9e/stl/inc/memory#L3080 With 3
Then the schema is: 0 - not locked 3 - lock without atomic wait (can't wait and ask for notify) 1 - lock with atomic wait, not being waited on (no need notify) 2 - lock with atomic wait, being waited on (need notify)
@BillyONeal what do you think on this?
Then in future we have two bits in addition to low lock bit
The alignment is only 4 so we only have 2 bits total. Note that reference count control blocks can come from existing shared_ptr instances so we can't special align them for atomic<shared_ptr<>>
Again though I don't think we have to worry about this. /std:c++latest
== ABI unstable.
Ok, I see only two bits has to be used.
I still don't see a problem changing current atomic_shared_ptr
to lock two bits at once (without any extra changes over it).
Then it could be at least potentially compatible with this: https://github.com/AlexGuteniev/STL/blob/61cb5b9a39d71f564d79dbfaeee7fd1ae38db464/stl/inc/memory#L3201-L3248
It is just about being a bit more friendly to /std:c++latest
users.
But sure if you are in the very last phase of release, probably you shouldn't change anything.
Actually, even if doing nothing, waiting version can be added on top of current without ABI break. It is due to current uses low bit, and always set the other bit to zero. So waiting version can set low two bits to zero.
There's a question where to put "notify needed" bit, it can be resolved in two ways:
notify_one
. It is not good as it impacts fast path._Store_and_unlock
.
(My preference would be to put it into somewhere that is still rarely used by pointers, and preserve most of pointer value for debugging/visualizer purposes. Highest bit is used on /LARGEADDRESSAWARE on x86 only, 3rd from right is only for exotic allocator (usually there would be max_align_t
))Again, I don't think any of that is worth the extra complexity. c++latest is not ABI stabilized.
Is this too late for ABI changes for #601 ?
@BillyONeal wanted to add atomic waits there, and it could be really useful.
But for that to perform best need to avoid unnecessary
notify_one
call when there's no contention. This can only be achieved by using one more bit, which affects ABI