Comparison in A* algorithm is wrong

ningit commented 3 years ago

Bug report

Required Info:

Operating System:
- Any.
ROS2 Version:
- Any.
Version or commit hash:
- All commits in master including the current one (909c6f2da7423440d7bd4db3115a82cf8802249a).

Steps to reproduce issue

The issue can be observed by looking at the source code, but does not probably affect the behavior of the planner.

Expected behavior

When examining the neighbors to be explored in the A* algorithm, potentials are compared with potentials or complete path estimates are compared with complete path estimates.

Actual behavior

When examining the neighbors to be explored in the A* algorithm, potentials are compared with complete path estimates.

Additional information

There is a conceptual error in the code of the updateCellAstar method of the NavFn planner, since potentials are compared with estimates of the complete path to decide whether a cell is explored. However, this problem does not have practical consequences, because the Euclidian distance is admissible for the A* algorithm, and no cell needs to be considered twice. In fact, the comparison is superfluous assuming the Euclidean distance is used.

The code into question is in line 550 of navfn.cpp, where affected neighbors are added to priority blocks according to the comments and to the standard A* behavior. Apparently, neighbors whose potential can be improved from the current node are added to the queue, to be processed in the next iteration. However, only unvisited nodes would be added to the queue, because incomparable quantities are compared. These comparisons are done for each direction, but the code below and the explanation refer only to the left neighbor without loss of generality.

// line 499
float u, d, l, r;
l = potarr[n - 1];
// [...] line 539
float le = INVSQRT2 * static_cast<float>(costarr[n - 1]);
// [...] line 547
float dist = hypot(x - start[0], y - start[1]) * static_cast<float>(COST_NEUTRAL);
// [...] line 549
potarr[n] = pot;
pot += dist;
if (pot < curT) {
  if (l > pot + le) {push_next(n - 1);}
  // [...]

The expansion of the condition is potarr[n-1] > potarr[n] + dist + le, i.e. a distance from a cell to the destination (approximately) is compared to a complete distance from the origin to the destination. This comparison is unlikely to hold unless l is infinity (the initial value of the potential) or the cells are very close to the origin.

In other words, using the notation of the Wikipedia page on the A* algorithm, this code is comparing g(n) with f(n) = g(n) + h(n).

SteveMacenski commented 3 years ago

I see no issue with this, there's no expectation that this is a pureist A* implementation from a wikipedia article. There are additional optimizations here because we're dealing with wavefront potentials and very low level memory buffer management.

Especially if you say that this doesn't have any practical impact, I'm not sure what we're talking about here. Can this be closed?

ningit commented 3 years ago

The statement pot += dist in line 550 is unnecessary. It is a mistake, it makes the following comparison absurd, not just heteredox. It cannot be an optimization (adding unnecessary instructions cannot be an optimization).

While there is no practical impact in the current setting, if you ever wish to change the heuristic by another non-admissible one, your algorithm will stop working properly. If the heuristic will never change, then your algorithm is doing unnecessary additions and comparisons, against performance.

SteveMacenski commented 3 years ago

This is not a purist A* implementation if you read through the code in a number of ways and there is no plan to ever change NavFn's heuristics (or in general code) unless a significant bug or performance improvement is determined.

If you remove that addition and things still seem to work ok, you see the exact same number of iterations and path, I'd consider approving a PR to remove it

SteveMacenski commented 3 years ago

Any action to be taken here? If not, I'd prefer to close this.

ningit commented 3 years ago

I am preparing a PR that removes the comparisons in that fragment of updateCellAstar. The change has no effect on the paths of any of the tests in the repository (in fact, these conditionals are never executed without the left-hand side being POT_HIGH) and something can be proven in general.

We are aware that this A* implementation is not a textbook one. In fact, we have studied it in detail (in this repo).

ningit commented 3 years ago

After checking that the changes that would be proposed in the PR promised in the previous are not absolute improvements, at least in a practical sense, I desist from sending it and I have nothing against closing the issue. These unexplained comparisons (which seems to be a mistake after all) play their role of partially correcting the non-optimality of the threshold-based A* implementation, while not doing it much so that the performance is affected.

Just for completeness, the change considered were:

Replacing the conditions x > pot + lx in lines lines 551-560 of navfn.cpp by x == POT_HIGH saves several floating-point operations per call to updateCellAstar and does not change the results in binary maps like those included in the nav2_system_tests directory, where the original condition is never satisfied without x being POT_HIGH. In general, the previous phrase is true except when updating cells that are very close to the start position (where the heuristic does not make the comparison unfair), whose distance can be bound by the cost variability and the map size. However, the change may produce longer and more expensive paths for maps with the appropriate shape and significant cost differences.
Removing the statement pot += dist and replacing pot by pot + dist in the condition of line 551 removes an assignment, gives sense to the comparisons, and makes the navigation function more accurate. However, the performance may be worsened because nodes are revisited more often, and the algorithm will still be non-optimal. Moreover, a better navigation function does not necessarily produces shorter paths.

The statement in the first message saying that the comparison are superfluous because the heuristic is admissible (nuanced later in the same message) is not true, because this implementation does not consider the nodes in order. For efficiency, the nodes in each iteration are processed in the order they were inserted in the queue, regardless of their costs, and the range of costs handled at the same time may increase in every iteration. Paths produced by this algorithm may be much longer than the optimal one in the worst case, but this probably does not usually happen in practice.

SteveMacenski commented 3 years ago

Thanks for the analysis! Per your recommendations, I'm closing the issue. I definitely recognize there may be some oddities in NavFn, but its been in use for so long by so many people that any changes to it would have to be matched with some measurable / tangible improvements in performance or paths to be merged. It doesn't sound like these changes would have any significant change so I'd rather maintain historical behavior in the absence of much practical benefit.

I'm generally much more open to changes across the stack, but NavFn is one of those historic projects that I don't think it would be good for us to poke too much unless we think we can make significant improvements in some respect.

ros-navigation / navigation2