magazino / move_base_flex

Move Base Flex: a backwards-compatible replacement for move_base
https://uos.github.io/mbf_docs/
BSD 3-Clause "New" or "Revised" License
424 stars 154 forks source link

Allow the controller to handle cancel if properly implemented #274

Closed corot closed 2 years ago

corot commented 3 years ago

Possibly a much safer solution for #273. This allows smooth cancels with decreasing velocities without risking the deadlocks reported in #273. The controller becomes responsible of slowing down by returning decreasing speeds on computeVelocityCommands and SUCCESS outcome. When he considers that the robot can stop, returns zero velocity and CANCELED outcome.

Trivial example implementation for TEB:

image

If the controller doesn't implement the controller plugin API cancel method, or returns false, we use the normal stop method with an abrupt zero-velocity.

:warning::warning::warning: One corner case is when we cancel the controller but it reaches the goal while slowing down. In this case, we will report success, which is a sensible result because we accomplish the requested goal despite the cancel, but one could argue that we should return preempted. :warning::warning::warning:

Example:

  1. send goal to exe_path
  2. cancel it, slowing down
  3. send the same goal before it stops --> we preempt the canceling and start executing the new goal

https://user-images.githubusercontent.com/322610/125382572-d91a1500-e3d0-11eb-92fc-cc8776a2a791.mp4

dorezyuk commented 3 years ago

@corot I had a look at the bug and it looks not good. I've dead locked the node with your code and generated a coredump. Interestingly in the coredump, I didn't find the thread execution the cancel method anymore - which explains why the cancel method never returns...

Since its related to the spinning (I'm not sure if spinning within the plugin is actually a good idea but this is not the point), I've created a dedicated callback queue just for our servers - and this seems to solve the problem.

Could you check out https://github.com/magazino/move_base_flex/tree/dima/callback_queue (without the here mentioned changed to TEB) to see if it works for you?

corot commented 3 years ago

@corot I had a look at the bug and it looks not good. I've dead locked the node with your code and generated a coredump. Interestingly in the coredump, I didn't find the thread execution the cancel method anymore - which explains why the cancel method never returns...

you mean, u had a deadlock with this PR? And which controller? default TEB or my example for slowing down?

corot commented 3 years ago

Interestingly in the coredump, I didn't find the thread execution the cancel method anymore - which explains why the cancel method never returns...

wooooo,,,, I knew something superfishy was happening,,, I though because of stopping the execution thread, but not any idea why a thread can disappear like that?

dorezyuk commented 3 years ago

you mean, u had a deadlock with this PR? And which controller? default TEB or my example for slowing down?

Your TEB example from #273

corot commented 3 years ago

Your TEB example from #273

Ah,,, yes, yes. I thought u meant this PR :relieved:

It's still interesting to investigate that strange problem, but this PR is a much cleaner solution to smooth cancels, imho

corot commented 3 years ago

Could you check out https://github.com/magazino/move_base_flex/tree/dima/callback_queue (without the here mentioned changed to TEB) to see if it works for you?

But wait,,, this can be much simpler, right? Not enough to run an async spinner instead of spin?

In any case, yes, sounds reasonable to run this multi-threaded. Did you try to reproduce the freezing with this change, but removing the ros::spinOnce();?

dorezyuk commented 3 years ago

But wait,,, this can be much simpler, right? Not enough to run an async spinner instead of spin?

Also a good idea - to me both seems fine.

Did you try to reproduce the freezing with this change, but removing the ros::spinOnce();?

Not sure about which spinOnce we are talking now :confused: About the one in cancel or in the main? It worked for me with the spinOnce in the cancel

corot commented 3 years ago

Not sure about which spinOnce we are talking now About the one in cancel or in the main? It worked for me with the spinOnce in the cancel

Then, can u approve this PR? Also @spuetz, can you test and approve this?

As for the async spinner... I see the point, but is a completely different issue and so we shouldn't mix with this PR

corot commented 3 years ago

@dorezyuk , @spuetz , Description updated; take a look at the :warning: section and tell me if u ratify my choice