SteveMacenski commented 4 years ago

The behavior tree has some problems, outlined in #1371

I think there needs to be a design discussion about things we want to be able to accomplish and the needs of robots using this in the real world.

In particular, planning should not be allowed to run over, it needs to have a cut off. The controller and planner need contextual recoveries (possible in current land) but recovery state needs to be able to goto or similar to planning or control (whoever failed) between attempts. Its not acceptable to do all recoveries for any failure. Additionally, when a path has been invalidated, the path needs to be immediately recomputed, without throttling or waiting. Its not acceptable to poll at a rate, it needs to be reaction based.

more defaults in the PR discussion. Overall I think this is a good starting point for the BT given we didn't really know everything we wanted when it started, but I think it needs largely to be rewritten now based on shared experiences and how the stack around it has changed. BT Nav is the core of Navigation2, if this isn't clean or work for industrial applications, very little of the work we do on top of it matters

crdelsey commented 4 years ago

Its not acceptable to poll at a rate, it needs to be reaction based.

The behavior tree is polling at a rate by its very nature. We can increase the rate to high levels (I'm not sure how high we can go). If that's not acceptable, then we can't use behavior trees. However, it seems likely that for any given workload, there is a tick rate that is fast enough.

SteveMacenski commented 4 years ago

Then maybe behavior trees arent the appropriate technology choice to handle the planning and recovery states.

I don’t view this feature as optional, its required for safety and reliability of systems that arent just running a TB3 at 0.2m/s. Any amount of speed or any amount of large spaces is going to make the behavior tree as it currently sits unusable

SteveMacenski commented 4 years ago

1395 related with more context on the replanning/invalidation. the system needs to be safe with any replanning rate (or no replanning at all, which is a totally valid way to go when there's not alot of moving stuff in your environment).

It seems like there's some capabilities that are really hard to implement in the existing set of plugins and base classes, but I don't see anything about the theory behind a behavior tree that wouldn't let us do these things. I think its just about starting from ground zero again and thinking about knowing what we know now and how things are architected now, would we do things the same.

And if the answer is "no", what would those changes be and are the improvements enough worth doing. My feeling is the answer would be yes with these issues in mind and some of the other stuff related to testing between recoveries for if the thing that failed is OK now.

In particular when chatting with folks that don't know the Navigation stuff very well (or at all) but understand behavior trees they ask about things like

goto within a node to go to another node, or try to run the node just to see if its successful, or making it "graph"-like
dealing with async behaviors when you want to check something while something else is running

And then I think there's value in not a "planning node" or "control node" but "planning {request, response, feedback} nodes" and control {request, response, feedback} nodes". I think this would help us create much better and handle much better long running async actions like control (and potentially changing planning to be the same way).

General capability we're still missing for Navigation1 parody

No or vary configurable replan rate and plans recomputed on invalidation
Check if whatever state failed to compute is working again after each recovery

and I'm sure there's more things that I'm missing too since to my knowledge we haven't gotten any feedback from robot systems operating at > 1m/s environments on the regular. my feeling the behavior tree must have a couple other quirks we haven't even run into yet.

mkhansenbot commented 4 years ago

This is a very long thread, however I think that if this boils down to the two bullets listed:

No or vary configurable replan rate and plans recomputed on invalidation

Check if whatever state failed to compute is working again after each recovery

I think those are solvable. For example, we can push some of the looping logic for the replanning down from the BT into the planner_server node. We can also add ability for the controller_server to call the "compute_path_to_pose" action directly when a path is invalid. And for the recoveries, we're already working on that, in fact @mhpanah has a proposed tree for doing it.

I'm going to be offline the next couple of days but lets put this on the agenda for the next Nav2 WG next week. Until then, Happy Thanksgiving! (gobble, gobble)

SteveMacenski commented 4 years ago

Did some reading this evening on behavior trees and looked at best-practices in other projects and I think this is fine how it is. There's implementation considerations I may want to change down the line, but a re-architecture is unnecessary.

ros-navigation / navigation2

Reachitect Behavior Tree / Nodes #1387

1395 related with more context on the replanning/invalidation. the system needs to be safe with any replanning rate (or no replanning at all, which is a totally valid way to go when there's not alot of moving stuff in your environment).