Closed rhaschke closed 7 years ago
I think the integration tests make this PR way too big - should be separate as discussion on call
I'm afraid that I cannot handle the comments before next week. Please be patient.
@rhaschke too bad. Sorry it took me so long :( @130s is it ok, if this will take another week before a new indigo release? Releasing without this fixed-up request is not an option.
Hopefully before August 5th...
@davetcoleman , @130s, @mikeferguson / whoever feels responsible. This is a bit critical, so please read and comment.
@davetcoleman this is not going to happen before Friday, I've been told @rhaschke is on vacation starting this Monday.
He will be really pissed because of that, but at the moment I'm considering a revert of the whole list of commits for #442 up to now. Maybe that's what we should have done directly after the unreviewed merge...
We have at least one open bug with respect to these changes by @dornhege, a list of things that should be improved, ABI breakage w.r.t. the last release and I just noticed a segfault in the RViz plugin in my setup when removing the MotionPlanning display, that gdb blames on the new spinner_
that has been added to the PlanningSceneMonitor. I'm not sure we can have that spinner there, because ros keeps a recursive mutex on spinners, and there must only be one thread that keeps spinners. I'm pretty sure this might have broken API in a totally unintuitive way (definitely not acceptable in indigo), because if you keep a AsyncSpinner around from your main thread and create the monitor from a different thread things probably break.
Overall, this is no state in which we should let people work on "minor" issues and the whole "fix the start state" patch is clearly not ready for a release.
I therefore propose to revert these change-sets and try to add only the part that makes the ExecutionController check whether the trajectory it is about to execute starts somewhere around the current state. This should be possible without most of this code and without breakage. Later on we can focus on getting this to work without this many issues..
Yes, after seeing https://github.com/ros-planning/moveit_ros/issues/736 and possibly related https://github.com/ros-planning/moveit/issues/10 I have been worried about the state of the PlanningSceneMonitor. Considering it was never fully reviewed I do not think we should feel guilty about the revert. +1
+1 for the revert. Ideally before the merge and unless it is shown that #736 is not an issue in general or independent from this, we cannot release. The PR is still there to be worked on, but this seems to be not trivial to get right.
Looks like AsyncSpinner isn't really usable as intended. As I can't work on the issue in the next days, go for the revert as suggested by @v4hn.
@v4hn can you do the revert today before tomorrow's merge?
I opened a request to revert the previous merge commits in #742 .
I'll close this request. We will have to look through the changes again anyway and rebase/reformat them for the merged repo.
@v4hn @dornhege @davetcoleman
I'm back from vacation and I'm going to look into the AsyncSpinner
issue now.
Obviously any use of an AsyncSpinner
is currently not safe: If they are started from different threads, only the very first one will actually be started. Hence it depends on the order of their start, which one will be active, which is not acceptable. As there are multiple AsyncSpinners
used throughout the source, #736 reported by @dornhege might be only the tip of the iceberg.
I will try to fix that issue upstream in ros_comm
and then analyze, why it failed in #736 but not for me.
On Mon, Aug 15, 2016 at 12:32:20AM -0700, Robert Haschke wrote:
I will try to fix that issue upstream in
ros_comm
and then analyze, why it failed in #736 but not for me.
Thanks! This is clearly a bug in ros_comm and the commit that introduced the global mutex there is at fault. The trouble is, it might be hard to remove it again without understanding why it was introduced in the first place.. Maybe it's worth a try to contact the author? It has been a couple of years, but maybe he remembers the original bug (if there was one)?
Welcome back @rhaschke, hope vacation was good! Thanks for continuing to work on this
The other usage of AsyncSpinners
in moveit_ros
source tree is uncritical. They mostly belong to stand-alone programs. I think, the reason that @dornhege observed the bug, but not me, was that he was running the whole perception pipeline too. The perception/mesh_filter
instantiates its own PlanningSceneMonitor and thus has thrown the error.
I prepared https://github.com/ros/ros_comm/pull/867 and hope that they will consider it timely.
Wow, given the scope of your patch @rhaschke I'm pretty sure this will not be merged in indigo, not sure about kinetic. But we should probably fix the safety issue in indigo...
This continues #716 and #724 which were both merged already. I repeat here my comments to #724:
@v4hn Realizing the simplification we agreed on in the phone call turned out to be rather difficult:
stateUpdateTimerCallback()
(removing the additional timestamp check) caused dead locks. Hence, I reverted these changes.validatePlan()
fromMoveGroupInterface
(client side) toMoveGroup
'sExecutionServiceCapability
(server side) - as agreed.Simplification of syncSceneUpdates()
: We definitely need both while loops and the timeout. When the robot doesn't move at all, we will never receive a scene update. The first loop is required to check for a recent robot state update directly in CSM, while the second loop (in case there is no CSM) waits - with a timeout - for a direct scene update. There is no way to omit this timeout. The only chance would be to trigger all input channels of PSM to send updates - which is not possible. However, inmove_group
, this doesn't pose a problem, because there is a CSM instantiated.syncSceneUpdates()
after trajectory execution makes sense. Doing so, led to failure of the test suggested in #442. Now I fixed it.CSM::waitForCurrentState()
didn't do what I expected from the name...Some more thoughts:
syncSceneUpdates()
should be renamed towaitForCurrentRobotState()
.waitForCurrentState()
inCurrentStateMonitor
, which actually should be renamed towaitForCompleteState()
as they don't look at recent timestamps.syncSceneUpdates()
which incorporates all pending scene updates in the callback queue. I agree to @v4hn that this doesn't guarantee that all scene updates up to a given timestamp are incorporated, because - due to network delays - ROS messages might be received delayed. The only guarantee for a synchronous scene update is the new methodapplyPlanningScene()
.