Closed bdash closed 8 years ago
The change from placement-new on a mmaped buffer to copying a SharedInfo to the file is not valid because SharedInfo has pthread_mutex_t members. Because they're unlocked at the time it's unsurprising that it happens to work on many platforms, but POSIX doesn't require that memcpy on a process-shared mutex should work.
@tgoyne, I have a hard time making sense of that.
First of all, do we agree that the following scenario is supposed to work?
If that works, why would the new scheme used by SharedGroup::do_open()
not work? How can step 3 tell the difference between a mutex in a file prepared before a reboot, and one that was prepared by copying some bytes into a file? What is the significant difference?
@tgoyne, here is another scenario that is supposed to work as far as I know:
Again, if this works, why not the scheme used in SharedGroup::do_open()
?
As far as I can tell, POSIX only requires that a process shared mutex remains mapped at a fixed address from the point of view of process A while it is locked by process A.
In particular, when the mutex is not locked by anybody, its state is self contained, and can therefore be cloned and copied around as desired.
@tgoyne, I will be grateful if you can just hint at the kind of implementation of process shared mutexes that would invalidate the scheme currently used in SharedGroup::do_open()
.
This doesn't reproduce in the iOS simulator, only on an iOS device. Any iOS device capable of running iOS 9 should be able to reproduce this when following the steps I outlined above.
First of all, do we agree that the following scenario is supposed to work?
No, I do not see anything in POSIX 2013 that even hints at either of those being a supported scenario. pthread_mutex_t is explicitly an opaque type that can only be manipulated with the provided functions, and PTHREAD_PROCESS_SHARED merely makes it so that it can be used by any thread which has access to the mutex, even if the mutex is created in shared memory. Mmaping a copy of the file would not give you access to the same memory.
Mmaping a copy of the file would not give you access to the same memory.
Of course not. I only meant to imply that the copy was itself a valid mutex that can be mapped and then locked.
No, I do not see anything in POSIX 2013 that even hints at either of those being a supported scenario.
I agree that POSIX is not explicit about it, but from the man pages it is clear that the following scenario is valid:
So what you are saying, is that this scenario is only valid if the system is not rebooted during step 3 (time passes).
I would find that very odd for several reasons:
And:
All I am saying, is that as far as I can see, one can deduce from the POSIX man pages that a process shared mutex has to have a self contained state (no critical information saved outside mutex object) while it is not locked by anybody (step 3, time passes).
It is of course a separate question whether iOS adheres to that rule or not.
All I am saying, is that as far as I can see, one can deduce from the POSIX man pages that a process shared mutex has to have a self contained state (no critical information saved outside mutex object) while it is not locked by anybody (step 3, time passes).
@tgoyne, please let me know if you know anything specific about the iOS implementation of process shared mutexes that makes it not adhere to this rule.
@tgoyne, please see "Process Shared Memory and Synchronization" in the man page of pthread_mutexattr_init
. Note especially the semaphore example, and this remark:
In particular, these processes may exist beyond the lifetime of the initializing process.
Here is a little extra analysis from me:
One can imagine that iOS allocates part of the mutex state in a slot of memory somewhere outside the mutex object whenever a shared mutex is initialized. In that case the slot must be identified during any subsequent access to the mutex, and that would invalidate the reboot scenario.
However, since a file containing a process shared mutex can be created via one map, and then later accessed via another map at a a different address (even inside the same process, or thread), we can safely assume that the address of the mutex object is not stored (at least not for any critical purpose) while the mutex is not locked by anybody.
Further more, even if iOS behaves like this, the current implementation of SharedGroup::do_open()
would still work, as the mutex object is still initialized before being used.
A little digging around in Apple's pthread implementation suggests that the alignment of info
in the following block from SharedGroup::do_open
is the relevant factor:
// Write an initialized SharedInfo structure to the file, but with
// init_complete = 0.
SharedInfo info(durability, history_type);
m_file.write(reinterpret_cast<char*>(&info), sizeof info); // Throws
Information about the alignment of a mutex is stored within the pthread_mutex
structure during pthread_mutex_init
. If the alignment of info
differs from the alignment of the SharedInfo
structure in the mapped region, the alignment information in the pthread_mutex
structures will not match the alignment of the addresses the mutexes are mapped at. Changing the declaration of info
to alignas(64) SharedInfo
gives the structure the same alignment as when it's remapped, and we no longer crash.
@bdash: Thx for digging into this, we'll make a fix.
@bdash, thanks a lot for pinpointing the problem. One thing, though, I am assuming that you meant to align on a 64 bit boundary, which means that alignas(8)
will suffice. Note that alignas()
takes a "number of bytes" argument.
Also thanks a lot for the precise repro instructions.
@finnschiermer will implement a fix and release another version of core today.
With core v0.97.0, released late last week as part of Realm Cocoa v0.98.4, users are seeing crashes below
SharedGroup::do_open
on 32-bit iOS devices. This was reported to us in realm/realm-cocoa#3321.The exception looks like so:
The backtrace like so:
I was able to reproduce by doing the following:
cd realm-core
git checkout v0.97.0
REALM_ENABLE_ENCRYPTION=yes REALM_ENABLE_ASSERTIONS=yes sh build.sh config
REALM_COCOA_PLATFORMS="iphone" sh build.sh build-cocoa
cd ../realm-cocoa
git checkout v0.98.4
curl "https://gist.githubusercontent.com/bdash/d7ee7bfb4624c57f689f/raw/6bd1be5f75cc42c52ed94fbe7a0bf0f635c822c4/armv7-only.patch" | patch -p1
(hacks the Xcode projects to build for armv7 only)open examples/ios/objc/RealmExamples.xcworkspace
A
git bisect
pinpointed b78b70438dbe6dba89193c16a4fdc341f27b4f27 as the culprit for the breakage. Nothing obvious jumps out to me in that change, but I was able to verify this by switching between b78b70438dbe6dba89193c16a4fdc341f27b4f27 and its parent./cc @kspangsege