Heuristic function overestimates cost, leading to suboptimal plans.

steadmon commented 7 years ago

A* with a closed set can select suboptimal plans if the heuristic function overestimates the remaining cost of an intermediate state. The current heuristic of counting the conflicting states can be an overestimate in some cases. Consider the following setup:

Six state values: a, b, c, d, e, f Initial state: all six values are False Goal state: all six values are True Four actions:

"setA", no preconditions, sets a:=True, cost of 2
"setAB", no preconditions, sets a:=True and b:=True, cost of 1
"optimal", precondition a==True and b==False, sets all six state values to True, cost of 1
"trap", precondition a==True and b==False, sets all six state values to True, cost of 5

The optimal plan is "setA" followed by "optimal", with a total cost of 3. However, with the current heuristic, the code actually selects a plan of "setAB" followed by "trap", with a total cost of 6. This is because the heuristic overestimates the remaining cost after the action "setA" as having a likely remaining cost of 5, when in fact the remaining cost is only 1.

This can be fixed by changing the heuristic function as follows:

After all actions and costs are known, find the action with the minimum (cost / number of output states) ratio. Save this optimal ratio, and pass it as an additional argument to the heuristic function. The heuristic function should then return the number of conflicting states multiplied by the cost/output ratio.

stolk commented 7 years ago

Good point. Thanks for pointing this out.

Sorrien commented 6 years ago

Has this been resolved?

stolk commented 6 years ago

This issue is still present in the current code base.

You could follow steadmon's approach, or for an even quicker fix: returning a constant value 1 (or better: the lowest cost of all actions) for heuristic will also get rid of suboptimal paths.

To thoroughly fix, you would have to determine lowest cost of flipping on a

per atom basis
separately for both flip directions: 0->1 and 1->0.

Then dynamically, add this cost for each differing atom value based on direction.

This will slow down the calculation of the heuristic of course, so if you can live with occasional sub optimal paths, the current code may be better.

Note: If you decide to go steadmon's route, you will need to use floating point values for the heuristic.

kampeador commented 2 years ago

#include <stdio.h>
#include <assert.h>

#include "goap.h"
#include "astar.h"

void check_goap(bool err)
{
    assert(err == true);
}

int main(int argc, char* argv[]) {
    actionplanner_t ap;
    goap_actionplanner_clear(&ap);

    // intial state
    worldstate_t ws_initial;
    goap_worldstate_clear(&ws_initial);
    check_goap(goap_worldstate_set(&ap, &ws_initial, "a", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "b", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "c", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "d", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "e", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "f", false));

    // goal state
    worldstate_t ws_goal;
    goap_worldstate_clear(&ws_goal);
    check_goap(goap_worldstate_set(&ap, &ws_goal, "a", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "b", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "c", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "d", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "e", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "f", true));

    // actions
    // "setA", no preconditions, sets a:=True, cost of 2
    goap_set_pst(&ap, "setA", "a", true);
    goap_set_cost(&ap, "setA", 2);

    // "setAB", no preconditions, sets a:=True and b:=True, cost of 1
    goap_set_pst(&ap, "setAB", "a", true);
    goap_set_pst(&ap, "setAB", "b", true);
    goap_set_cost(&ap, "setAB", 1);

    // "optimal", precondition a==True and b==False, sets all six state values to True, cost of 1
    goap_set_pre(&ap, "optimal", "a", true);
    goap_set_pre(&ap, "optimal", "b", false);
    goap_set_pst(&ap, "optimal", "a", true);
    goap_set_pst(&ap, "optimal", "b", true);
    goap_set_pst(&ap, "optimal", "c", true);
    goap_set_pst(&ap, "optimal", "d", true);
    goap_set_pst(&ap, "optimal", "e", true);
    goap_set_pst(&ap, "optimal", "f", true);
    goap_set_cost(&ap, "optimal", 1);

    // "trap", precondition a==True and b==False, sets all six state values to True, cost of 5
    goap_set_pre(&ap, "trap", "a", true);
    goap_set_pre(&ap, "trap", "b", false);
    goap_set_pst(&ap, "trap", "a", true);
    goap_set_pst(&ap, "trap", "b", true);
    goap_set_pst(&ap, "trap", "c", true);
    goap_set_pst(&ap, "trap", "d", true);
    goap_set_pst(&ap, "trap", "e", true);
    goap_set_pst(&ap, "trap", "f", true);
    goap_set_cost(&ap, "trap", 5);

    char desc[4096];
    worldstate_t states[16];
    const char* plan[16];
    int plansz = 16;
    const int plancost = astar_plan(&ap, ws_initial, ws_goal, plan, states, &plansz);
    LOGI( "plancost = %d", plancost );
    goap_worldstate_description(&ap, &ws_initial, desc, sizeof(desc));
    LOGI( "%-23s%s", "", desc );
    for (int i = 0; i < plansz && i < 16; ++i)
    {
        goap_worldstate_description(&ap, states+i, desc, sizeof(desc));
        LOGI("%d: %-20s%s", i, plan[i], desc);
    }

    puts("Exit success");
    fflush(stdout);

    return 0;
}

Output:

plancost = 3
                       a,b,c,d,e,f,
0: setA                A,b,c,d,e,f,
1: optimal             A,B,C,D,E,F,

Am I missing something?

stolk commented 2 years ago

Maybe OP's example was not a good one, but it is a valid point that overestimating the remaining cost can cause suboptimal plans.

Personally, I don't worry much about those cases. In practice, they probably don't occur often, and if they do, you still end up with a working plan.

If you are hell-bent on always getting the best plan, you can make the search un-directed, and always put the estimate at '1'. But then you get a combinatorial explosion.

Playing with the heuristic is left as an exercise for the reader. Use what works for you. What's in the repo now works for me.

kampeador commented 2 years ago

It would be nice if issues like this one had practical unit test coverage, rather than theoretical.

stolk / GPGOAP

Heuristic function overestimates cost, leading to suboptimal plans. #9