Closed alainsanguinetti closed 8 years ago
Looks good, except can you rebase your commits to just one please?
I am investigating this issue at the moment and it seems to be either a bug in pluginlib or in our usage of it (or both). I don't like the solution found here. I'll report back once I have a working diff or I give up. :)
@davetcoleman here is the rebase. @v4hn I agree that my solution looks a bit like a workaround :-) I think it's a combination of boost shared pointer and classloader destructors happening in the wrong order or classloader happening too late when the library is already unloaded by the program ?
@v4hn @alainsanguinetti
The fundamental issue underlying #592 is indeed a design flaw of class_loader
that I noticed a while ago: Both, class_loader
and MoveIt
use some static variables (in case of MoveIt
the SharedStorage), whose release order is not well defined. This causes the ClassLoader
stuff to be freed before SharedStorage
, which attempts to access the former memory when ShardStorage
is freed.
Actually, ClassLoader shouldn't free it's stuff as long as there are any pointers pending. However, this requires a rewrite of ClassLoader...
I'm not really sure, how the proposed fix here, actually can solve the issue: Resetting the pointer is exactly what the default destructor would do as well, isn't it?
@rhaschke I think the behaviour you mention is what class_loader is supposed to do. Plus I think there is a difference between manually resetting a boost shared pointer and letting the default destructor delete. I still use normal pointer to avoid that kind of complexity.
Indeed, ClassLoader shouldn't unload the library as long as it's still in use. Using MoveIt's rviz plugin, I actually get an appropriate warning message when unloading the plugin. However, here we get an error that ClassLoader didn't loaded the library and thus isn't aware of it. I'm not sure where this originates from. Actually the kinematics loader was loaded via the ClassLoader mechanism...
As far as my investigation goes, resetting the shared_ptr explicitly changes the order of destruction. Normally the destructor of a shared_ptr'ed object is called after the object and all of its members are destructed. So resetting the shared_ptr beforehand moves the destructor call of the loader before the destruction of all other members.
@alainsanguinetti could you please test #705 to see if this also resolves the problem? While it also does not address the real problem, it at least removes unnecessary code instead of adding more code.
After debugging it for some more time and reading through @rhaschke's analysis, this is my understanding of the issue:
MoveIt! uses static variables in SharedStorage
and class_loader uses a number of static variables in class_loader::class_loader_private
.
Static function-local variables seem to be destroyed in the inverse order of the respective functions being called for the first time.
This results in at least the mutex returned by getLoadedLibraryVectorMutex
(likely most class_loader_private
variables) being destructed before MoveIt!'s SharedStorage
.
However, only when destructing SharedStorage
, the KinematicsPluginLoader
object is destructed and only then pluginlib tries to unload the libraries.
Thus, this unloading operates on officially destructed objects.
This happens with either patch.
However, depending on the exact order of destructors, the data still seems to be (deterministically) usable in some cases, but scrambled in others.
This is a severe shortcoming in the architecture of class_loader
, and it would be great if someone could fix it there: The unloading of libraries (again as @rhaschke mentioned in an issue there) should not be tied to the lifetime of class_loader but to the respective instances of the libraries classes too.
Leaving aside #705 that I believe to be reasonable either way, I can see only one way out of this problem that does not imply rewriting class_loader (anyone is welcome to do this for upcoming ROS releases, I suppose).
We could move away from automatic destruction of the SharedStorage
and explicitly destruct it somewhere instead. However, that somewhat defies the purpose, because the only place I can think of is in the destructor of MoveGroupImpl
et.al.
Even with some shared_ptr
construction the Storage would not be shared anymore when MoveGroup
's are being created and destroyed over and over again ...
As mentioned also by @v4hn, #705 will not fix the issue. The only workaround, I can see for now, is to disable destruction of SharedStorage
at all. This leaves a memory leak, but I guess that's fine, because until destruction of the process, the shared storage should be maintained anyway...
See #706 for a PR.
PR #705 is a better fix than this one so I close this one to avoid confusion.
Issue #592 was very annoying and prevented properly exiting from a running program.
I think this solves the issue, it does for my quite complex software and for the very simple example provided by @fontysrobotics.
Valgrind seem to have some complaints but:
Please let me know if I need to provide additionnal information.