Dominance check for elementary paths

steveharenberg commented 2 years ago

Describe the bug For elementary paths, how should the dominance check be defined? Some branches have this defined a different way. In master it's currently defined as:

https://github.com/torressa/cspy/blob/1c25c1064cd492e997d053e1248caf81646f995a/src/cc/labelling.cc#L200-L213

I think patch90 is most correct, where it is defined as: https://github.com/torressa/cspy/blob/dcea995db298c6a59bba42d7cfc792dc7fd2048f/src/cc/labelling.cc#L244-L255

Suggestions The std::includes should have a negation in front of it, right?

That is, if we have two labels, L1 and L2, with unreachable node sets U1 and U2, then L1 dominates L2 iff:

r1 <= r2 for each resource in L1, L2 respectively
U1 is a subset of U2

In which case, should the check be the following?

  if (params_ptr->elementary && unreachable_nodes.size() > 0 &&
      other.unreachable_nodes.size() > 0) {
    if (!std::includes(
            other.unreachable_nodes.begin(),
            other.unreachable_nodes.end(),
            unreachable_nodes.begin(),
            unreachable_nodes.end())) {
      return false;
    }
  }

which will check if U1 is a subsequence of U2 (or equal). If U1 is not a subsequence (or equal) L1 does not dominate L2 and thus returns false. Note the difference with patch90 is changing the iterator order and removing equality check (which I believe is redundant).

torressa commented 2 years ago

Thanks for raising this! I always get confused.

I think the version in master is correct (apart from the equality check which is wrong). Just pushed a new unit test to the monotonic-resource-checks branch. Please check that it makes sense. https://github.com/torressa/cspy/blob/00faec430ba5664b394ab1a4de6d64ffa8d2dbaa/test/cc/src/test_labelling.cc#L36

Your proposed version works fails on the new unit test, and also on the benchmarks (change the path in test/cc/test_benchmarks and then make benchmark). It checks that U1 is not a subsequence of U2 which is not the same as checking that U2 is a subsequence of U1. As the sets might just be completely unrelated which would be falsely dominated, as it happens in the first test https://github.com/torressa/cspy/blob/00faec430ba5664b394ab1a4de6d64ffa8d2dbaa/test/cc/src/test_labelling.cc#L49

I think we need to check U2 is a subsequence of U1 explicitly which means checking std::includes(this, other) as in master. But for some reason std::includes(other, this) also works on the benchmarks but not with the new unit test. Also I think we need to check for equality, otherwise when U1 = U2 we may end up with neither label dominating, as it happens on https://github.com/torressa/cspy/blob/00faec430ba5664b394ab1a4de6d64ffa8d2dbaa/test/cc/src/test_labelling.cc#L68

steveharenberg commented 2 years ago

Hah yes it is confusing. Thanks for checking and the explanation! I hope I am wrong because the performance is much better with the master version (as opposed to patch90 version).

I am new to the area, but my interpretation from Righini and Salani, 2006 was that a label would dominate another label only if the partial path were a subset of the other partial path. Therefore, if you have two different sets of nodes, I would expect neither label to dominate. So, I would have expected the following to return false: https://github.com/torressa/cspy/blob/00faec430ba5664b394ab1a4de6d64ffa8d2dbaa/test/cc/src/test_labelling.cc#L49

So, is it the case that if you have two different sets of partial paths you can say the label dominates the other for elementary paths?

Aside from the above, do we need to pass other.unreachable_nodes.cend() to std::equal as well? If other.unreachable_nodes is smaller, it may lead to a bug?

torressa commented 2 years ago

I am new to the area, but my interpretation from Righini and Salani, 2006 was that a label would dominate another label only if the partial path were a subset of the other partial path. Therefore, if you have two different sets of nodes, I would expect neither label to dominate. So, I would have expected the following to return false:

I think that's totally correct, in this case since unreachable_nodes are not comparable, the resources is what's making L2 dominate as res2[1] < res[1]. Similarly with the case when the unreachable sets are the same in https://github.com/torressa/cspy/blob/00faec430ba5664b394ab1a4de6d64ffa8d2dbaa/test/cc/src/test_labelling.cc#L68

I tried making them as similar as possible, as if they are the same, the elementary check is not performed.

So, is it the case that if you have two different sets of partial paths you can say the label dominates the other for elementary paths?

If all the other criteria is satisfied, weight, and individual resource, yep!

What is done there is not only using the partial path but the unreachable nodes (which is just nodes that are in partial path + unreachable due to resources). It was introduced in Feillet et al. (2004), also referenced in the classic Irnich and Desaulniers (2005) (page 18 section titled ESPPRC).

Aside from the above, do we need to pass other.unreachable_nodes.cend() to std::equal as well? If other.unreachable_nodes is smaller, it may lead to a bug?

You're right here, I didn't know that std::equal allowed that. That'd be safer.

steveharenberg commented 2 years ago

So, is it the case that if you have two different sets of partial paths you can say the label dominates the other for elementary paths?

If all the other criteria is satisfied, weight, and individual resource, yep!

Ok cool. I was thinking about this incorrectly. Page 18 in your source helped me understand it, thanks! Feel free to close this. Your unit test looks good to me and the code looks good (other than adding that part to std::equal).

torressa commented 2 years ago

Sweet, thank you very much for checking!

steveharenberg commented 2 years ago

I accidentally posted this in #96 , but I meant to post it here, I am going to delete that comment and repost here.

Hey @torressa, I hit an example with the wrong result and I think this dominance check is not correct for elementary... at least when you have negative costs (or negative resource consumptions). Here is a small example showing incorrect output:

max_res = [100]
min_res = [0]

H = DiGraph(directed=True, n_res=1)
H.add_nodes_from(['Source', 1, 2, 3, 4, 'Sink'])
H.add_edge('Source', 1, res_cost=[1], weight=1)
H.add_edge('Source', 2, res_cost=[1], weight=1)
H.add_edge('Source', 4, res_cost=[1], weight=100)
H.add_edge(1, 3, res_cost=[1], weight=10)
H.add_edge(1, 'Sink', res_cost=[1], weight=1)
H.add_edge(2, 3, res_cost=[1], weight=5)
H.add_edge(2, 'Sink', res_cost=[1], weight=1)
H.add_edge(3, 1, res_cost=[1], weight=-10)
H.add_edge(3, 2, res_cost=[1], weight=-100)
H.add_edge(4, 3, res_cost=[1], weight=1)

# The optimal path should be ['Source', 1, 3, 2, 'Sink']
# -88.0
bidirec = BiDirectional(H, max_res, min_res, elementary=True, direction="forward")
bidirec.run()
print(bidirec.path)
print(bidirec.total_cost)

This returns the wrong result because the label with partial path L1=(Source, 2, 3) dominates the label with partial path L2=(Source, 1, 3).

There are no resources and L1.cost = 6 < L2.cost = 11. However, if we were able to expand L2 we would have been able to hit the large negative value link (3,2,-100) which is needed for the optimal path.

Based on an example like this, I don't see how we can prune elementary paths based on dominance checks. With elementary, the partial path matters; e.g., L1 cannot expand into any paths that visit node 2 for instance. If the link weights are only positive, that would be fine because there would be no reason for L2 to expand to node 2. Unfortunately, if there are links that are negative, we don't know whether the label can expand into a state with better cost until it's completely expanded.

Am I missing anything?

torressa commented 2 years ago

Aha! That's interesting! This goes back to what we were talking about before with unrelated partial paths. I think it's clearer if we write the check like this (edit going back full circle to your first comment https://github.com/torressa/cspy/issues/94#issue-1107332812)

  if (params_ptr->elementary && unreachable_nodes.size() > 0 &&
      other.unreachable_nodes.size() > 0) {
    // check unreachable_nodes is a subsequence of other.unreachable_nodes
    if (std::includes(
            other.unreachable_nodes.cbegin(),
            other.unreachable_nodes.cend(),
            unreachable_nodes.cbegin(),
            unreachable_nodes.cend())) {
      // Dominated cause all conditions are satisfied
      return true;
    } else {
      return false;
    }
  }
  return true;

This fixes the example as neither label would dominate but leads to a massive performance decrease (for benchmarks which are all positive costs). I think this is a special case as we have a few negative cost cycles. I'll have a think about this.

steveharenberg commented 2 years ago

Yes, sadly a big performance hit :(. I think we also have to do this if there are negative resource consumptions as well. You could envision a similar scenario.

And yes, that code makes sense to me.

torressa commented 2 years ago

Just for sanity I've rewritten the elementary check explicitly as in the paper. Seems to agree with the previous super slow case. I've added a check to not use elementary logic if no negative cost cycles are found, which I think is OK.

torressa commented 2 years ago

Reverted back again as it the set of unreachable nodes is more efficient (don't have to loop over all nodes dominance check).

I'm just working on some other bits to speed things up (motivated by Ruslan's answer in or.stackexchange). Adding the efficient labels as described (attempted in runDominanceEff). This keeps the labels sorted per node, so joining procedure can be stopped early (stills needs checking). Also #78 might be worth a shot.

Also getting it to work on VRPy (adding non-elementary route handling was pretty essential anyway), after that I'll release.

torressa / cspy

Dominance check for elementary paths #94