marota / Oracle4Grid

Documentation at https://oracle4grid.readthedocs.io/en/latest/
Mozilla Public License 2.0
3 stars 1 forks source link

Bad replay - Oracle agent does not come back toward reference configuration #27

Open marota opened 3 years ago

marota commented 3 years ago

Let's have those two grid2op actions in the best path when playing the oracle: si grid2op_A= action_1 + action_2 et grid2op_B=action_1+action_2 + action_3

In the case we have grid2op_B at timestep t and grid2op_A at timestept+1, replay will play:

We need to test that. topo_vect from graph_node configurations and from replay agent should be the same at each timestep

marota commented 3 years ago

This is connected to this comment as well https://github.com/marota/Oracle4Grid/issues/24#issuecomment-805836046

marota commented 3 years ago

For that, take an always changing action path from the graph and make sure to replay it well then We can use DICT_GAME_PARAMETERS_SIMULATION in replay for easier comparison. The rewards should be the same

marota commented 3 years ago

When we go from action_Oracle_A to action_Oracle_B, delta_action=action3 When we go from action_Oracle_B to action_Oracle_A, delta_action=inverse(action3)=initial_configuration(asset3)

NMegel commented 3 years ago

Done in commit e5eb05cde213ab96942b109fdbb94699c7eac5d2

A memory has been set in the OracleAgent with regard to whether the previous action was legal.

2 tests have been implemented

marota commented 3 years ago

It seems that if an different action is played at the same substation, a cancelling action would still be created according to https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/core/agent/OracleAgent.py#L97 But it shouldn't, we are just changing the configuration at the same substation

marota commented 3 years ago

In this test https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/test/integration_test.py#L378 Many actions might run illegal in this test as the action path is not path taken from the graph: https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/test/integration_test.py#L404 we should probably check how many there are and if several actions are indeed implemented

NMegel commented 3 years ago

All these actions are legal because the environment is loaded with easy default game rules image

With env.action_space._is_legal(action, obs._obs_env), we see that actions are always legal at each timestep. So they are all well done and undone.

NMegel commented 3 years ago

Added a correction in c9afedb9fc6148cb31ac8783cb58bc4fd6c44321 because env was generated twice but it doesnt change anything to the tests results

marota commented 3 years ago

All these actions are legal because the environment is loaded with easy default game rules image

With env.action_space._is_legal(action, obs._obs_env), we see that actions are always legal at each timestep. So they are all well done and undone.

I see it now, but is not really an integration test anymore as the action path should rather be taken from the graph with DICT_GAME_PARAMETERS_GRAPH. It is a good unit test though

marota commented 3 years ago

It seems that if an different action is played at the same substation, a cancelling action would still be created according to

https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/core/agent/OracleAgent.py#L97

But it shouldn't, we are just changing the configuration at the same substation

What about this problem ?

NMegel commented 3 years ago

I am not sure to understand the problem.

Let's suppose we have 2 atomic actions on sub 5. Their representations are sub-5-0 and sub-5-1

For example if we wan't to apply:

In this situation, OracleAgent will have to undo sub-5-0. This is what it does at line 97 when searching if there are previous atomic actions that don't belong to current actions