Closed VCommitter closed 7 years ago
Yesterday I confirmed this unload segfault occurs on 8.0 and 8.1 version of libV
Fascinating. The library load order matches the order they're listed in the make.llist file (and specified on the gcc linking command line). strace
will gladly show the run time load order:
strace -e trace=open,close vpooladmin -serverFile=serverFile.txt -clientcount
open("/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/tls/x86_64/libVCore.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/tls/libVCore.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/x86_64/libVCore.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libVCore.so", O_RDONLY|O_CLOEXEC) = 3
close(3) = 0
open("/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libV.so", O_RDONLY|O_CLOEXEC) = 3
close(3) = 0
open("/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libVca.so", O_RDONLY|O_CLOEXEC) = 3
close(3) = 0
open("/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libVsa.so", O_RDONLY|O_CLOEXEC) = 3
close(3) = 0
open("/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libVcaMain.so", O_RDONLY|O_CLOEXEC) = 3
close(3) = 0
open("/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
close(3) = 0
gdb
can also help examine share library loads:
(gdb) set stop-on-solib-events 1
(gdb) run -serverFile=serverFile.txt -clientcount
Starting program: /home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/vpooladmin -serverFile=serverFile.txt -clientcount
Stopped due to shared library event (no libraries added or removed)
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.2.x86_64
(gdb) continue
Continuing.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Stopped due to shared library event:
Inferior loaded /home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libVCore.so
/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libVsa.so
/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libVcaMain.so
/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libVca.so
/home/osvadmin/vision-open-source/software/builds/8.0.0/Linux_x86_64/bin/../lib/libV.so
/lib64/libpthread.so.0
/lib64/libuuid.so.1
/lib64/libstdc++.so.6
/lib64/libm.so.6
/lib64/libgcc_s.so.1
/lib64/libc.so.6
I'm a bit puzzled. It looks like you only moved 'VThread' to VCore even though VThread depends on a number of other components that remain in V. I trust that it makes the crash go away. I also recognize that the Linux linker is willing to resolve upward and downward, especially given the default 'nux symbol visibility policy of exposing everything. I wonder if this works when compiler and linker options are set to control symbol visibility more tightly (e.g, __declspec). If memory serves, I think your 'online' linux builds do that. Is that an issue and have you tried this there?
I only moved VThread
to libVCore.so
because I tracked down the exact static that was getting recreated in an infinite loop and determined that is all that would need to move. I almost just put a guard in the code so that static could never get recreated in a given thread but though a minimal libVCore.so
might have utility for controlling static destructor order in the future. And the smallest possible libVCore.so
seems like the most useful libVCore.so
so I didn't move anything else.
Are you suggesting that adding libVCore to the online system would cause it not to build? I haven't tried that because the online system is not on a version of gcc
or linux that has the new linux ABI. It would be a fair bit of work to bring OSV back into online just to run that test.
Regarding OSV and online, if it's not an issue, it's not something that needs to be done (at least for now).
Given that your fix solves a problem, that's probably good enough. Still, given what you're telling me about its possible dependence on ABI version, I wonder how stable the fix will be in the OSV world. Sounds like a lot of testing and finger crossing ahead. For example, I'd want to test this on at least Solaris (x86 and sparc) and a few more Linux variants (I'm away from my lab for the weekend, but when I'm back, I can set up some of that).
Beyond these questions, the more it seems that the real problem is that there's a lot of stuff attached to VTransientServices that probably doesn't belong there. I realize there was no way to know this, but in its original incarnation, VTransientServices was kind-of/sort-of supposed to abstract some low level operating system level services (not very well, but it's one of the oldest C++ classes in our codebase). The logging and related stuff that's in there now is definitely mission creep. Their fragility testifies to that. Out of curiosity, if you were to comment out the routines and state in transient services having to do with 'VString', how far up the food chain would you have to go before things stop compiling? I'd bet it's Vsa (maybe Vca). I can't help but wonder if we can't move the required functionality up to that level (maybe even a static instance of a VTransientServices subclass could be added there).
Hmmm...
Wonder if it would work?
A bit more on what I think is happening here. A VReferenceable
is being reclaimed at shared object unload time. In the test case here it's a VString
that lives in VApplicationLog
that kicks it off. So it gets reclaimed:
void V::ThreadModel::Multi::reclaim (VReferenceableBase *pObject) {
VThread::ReclaimObject (pObject);
}
VThread
is trying to do the reclamation:
static void ReclaimObject (VReferenceableBase *pObject) {
Here ()->reclaimObject (pObject);
}
But the Here
member function is causing a problem:
V::VThread::Reference V::VThread::Here () {
BaseClass::Reference pSpecific; Reference pThisInstance;
if (g_iTLSKey.getSpecific (pSpecific) && pSpecific.isntNil ())
pThisInstance.setTo (static_cast<ThisClass*>(pSpecific.referent ()));
else
pThisInstance.setTo (new VUnmanagedThread ());
return pThisInstance;
}
I believe (and this is the theory bit) the g_iTLSKey.getSpecific()
is always Nil
because some of the VThread
statics, specifically V::VThreadSpecific::Key const V::VThread::g_iTLSKey;
, have already been destroyed.
So in order to delete the VString
you create a new VUnmanagedThread
. This would be fine, the VString
would be properly destroyed except that the VUnmanagedThread
is also a VReferenceable
that must get destroyed as when VThread::ReclaimObject
exits; because VUnmanagedThread
is a VReferenceable
it needs another VUnmanagedThread
to be destroyed (g_iTLSKey
is still gone). Now you have an infinite recursion.
Destroying the VUnmanagedThread
you just created involves destroying it's inherited VReferenceable
which creates another VUnmanagedThread
. You continue to allocate another VUnmanagedThread
to delete the prior VUnmanagedThread
until you overflow the stack and die with a segfault.
Putting g_iTLSKey
into a lower level shared object, VCore
, causes it to be kept around while all other statics that inherit from VReferenceable
are deleted and the executable exits without an error.
In shared libraries that use the GNU/Linux ABI on unload of
libV.so
memory reclamation ofVReferenceable
heap objects from static instances withinlibV.so
on process exit.So far this only affects shared libraries built using the GNU/Linux ABI using gcc.
Prior Experience
VString
instances fromlibV.so
inrelease-8.1
that work around a similar unload.Reproducer
There are 3 VString members of VApplicationLog that is a member of the static VTransientServices - if one is set the executable that sets it overflows and segfaults on library unload:
Backtrace