moveit / moveit_ros

THIS REPO HAS MOVED TO https://github.com/ros-planning/moveit
70 stars 118 forks source link

#592 fixed by manually resetting the kinematics loader #702

Closed alainsanguinetti closed 8 years ago

alainsanguinetti commented 8 years ago

Issue #592 was very annoying and prevented properly exiting from a running program.

I think this solves the issue, it does for my quite complex software and for the very simple example provided by @fontysrobotics.

Valgrind seem to have some complaints but:

Please let me know if I need to provide additionnal information.

davetcoleman commented 8 years ago

Looks good, except can you rebase your commits to just one please?

v4hn commented 8 years ago

I am investigating this issue at the moment and it seems to be either a bug in pluginlib or in our usage of it (or both). I don't like the solution found here. I'll report back once I have a working diff or I give up. :)

alainsanguinetti commented 8 years ago

@davetcoleman here is the rebase. @v4hn I agree that my solution looks a bit like a workaround :-) I think it's a combination of boost shared pointer and classloader destructors happening in the wrong order or classloader happening too late when the library is already unloaded by the program ?

rhaschke commented 8 years ago

@v4hn @alainsanguinetti The fundamental issue underlying #592 is indeed a design flaw of class_loader that I noticed a while ago: Both, class_loader and MoveIt use some static variables (in case of MoveIt the SharedStorage), whose release order is not well defined. This causes the ClassLoader stuff to be freed before SharedStorage, which attempts to access the former memory when ShardStorage is freed. Actually, ClassLoader shouldn't free it's stuff as long as there are any pointers pending. However, this requires a rewrite of ClassLoader...

I'm not really sure, how the proposed fix here, actually can solve the issue: Resetting the pointer is exactly what the default destructor would do as well, isn't it?

alainsanguinetti commented 8 years ago

@rhaschke I think the behaviour you mention is what class_loader is supposed to do. Plus I think there is a difference between manually resetting a boost shared pointer and letting the default destructor delete. I still use normal pointer to avoid that kind of complexity.

rhaschke commented 8 years ago

Indeed, ClassLoader shouldn't unload the library as long as it's still in use. Using MoveIt's rviz plugin, I actually get an appropriate warning message when unloading the plugin. However, here we get an error that ClassLoader didn't loaded the library and thus isn't aware of it. I'm not sure where this originates from. Actually the kinematics loader was loaded via the ClassLoader mechanism...

v4hn commented 8 years ago

As far as my investigation goes, resetting the shared_ptr explicitly changes the order of destruction. Normally the destructor of a shared_ptr'ed object is called after the object and all of its members are destructed. So resetting the shared_ptr beforehand moves the destructor call of the loader before the destruction of all other members.

v4hn commented 8 years ago

@alainsanguinetti could you please test #705 to see if this also resolves the problem? While it also does not address the real problem, it at least removes unnecessary code instead of adding more code.

After debugging it for some more time and reading through @rhaschke's analysis, this is my understanding of the issue: MoveIt! uses static variables in SharedStorage and class_loader uses a number of static variables in class_loader::class_loader_private. Static function-local variables seem to be destroyed in the inverse order of the respective functions being called for the first time. This results in at least the mutex returned by getLoadedLibraryVectorMutex (likely most class_loader_private variables) being destructed before MoveIt!'s SharedStorage. However, only when destructing SharedStorage, the KinematicsPluginLoader object is destructed and only then pluginlib tries to unload the libraries. Thus, this unloading operates on officially destructed objects. This happens with either patch. However, depending on the exact order of destructors, the data still seems to be (deterministically) usable in some cases, but scrambled in others. This is a severe shortcoming in the architecture of class_loader, and it would be great if someone could fix it there: The unloading of libraries (again as @rhaschke mentioned in an issue there) should not be tied to the lifetime of class_loader but to the respective instances of the libraries classes too.

v4hn commented 8 years ago

Leaving aside #705 that I believe to be reasonable either way, I can see only one way out of this problem that does not imply rewriting class_loader (anyone is welcome to do this for upcoming ROS releases, I suppose). We could move away from automatic destruction of the SharedStorage and explicitly destruct it somewhere instead. However, that somewhat defies the purpose, because the only place I can think of is in the destructor of MoveGroupImpl et.al. Even with some shared_ptr construction the Storage would not be shared anymore when MoveGroup's are being created and destroyed over and over again ...

rhaschke commented 8 years ago

As mentioned also by @v4hn, #705 will not fix the issue. The only workaround, I can see for now, is to disable destruction of SharedStorage at all. This leaves a memory leak, but I guess that's fine, because until destruction of the process, the shared storage should be maintained anyway... See #706 for a PR.

alainsanguinetti commented 8 years ago

PR #705 is a better fix than this one so I close this one to avoid confusion.