Bad replay - Oracle agent does not come back toward reference configuration

marota / Oracle4Grid

Documentation at https://oracle4grid.readthedocs.io/en/latest/

Mozilla Public License 2.0

3 stars 1 forks source link

Bad replay - Oracle agent does not come back toward reference configuration #27

Open marota opened 3 years ago

marota commented 3 years ago

Let's have those two grid2op actions in the best path when playing the oracle: si grid2op_A= action_1 + action_2 et grid2op_B=action_1+action_2 + action_3

In the case we have grid2op_B at timestep t and grid2op_A at timestept+1, replay will play:

at t, action_1+action_2 + action_3 so the grid will be in config_t (topo_init + action_1+action_2 + action_3)
at t+1, action_1+action_2 so the grid will be in (config_t+action_1+action_2)=config_t instead of config_t+1=(topo_init + action_1+action_2). action_3 was not undone ...

We need to test that. topo_vect from graph_node configurations and from replay agent should be the same at each timestep

marota commented 3 years ago

This is connected to this comment as well https://github.com/marota/Oracle4Grid/issues/24#issuecomment-805836046

marota commented 3 years ago

For that, take an always changing action path from the graph and make sure to replay it well then We can use DICT_GAME_PARAMETERS_SIMULATION in replay for easier comparison. The rewards should be the same

marota commented 3 years ago

When we go from action_Oracle_A to action_Oracle_B, delta_action=action3 When we go from action_Oracle_B to action_Oracle_A, delta_action=inverse(action3)=initial_configuration(asset3)

NMegel commented 3 years ago

Done in commit e5eb05cde213ab96942b109fdbb94699c7eac5d2

A memory has been set in the OracleAgent with regard to whether the previous action was legal.

2 tests have been implemented

"test_cancelling_action" which is a unitary test on a simple 3 timesteps-path
"test_final_topo_replay" which is more an integration test - we choose a random 100-timestep path with 6 atomic_actions and a max_depth of 4. You were right Antoine it is the best to test a large number of use case. It is easy to verify that our last state has to be the initial state at which we apply the last action! We test that the final topo vect and final line status are comprehensive

marota commented 3 years ago

It seems that if an different action is played at the same substation, a cancelling action would still be created according to https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/core/agent/OracleAgent.py#L97 But it shouldn't, we are just changing the configuration at the same substation

marota commented 3 years ago

In this test https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/test/integration_test.py#L378 Many actions might run illegal in this test as the action path is not path taken from the graph: https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/test/integration_test.py#L404 we should probably check how many there are and if several actions are indeed implemented

NMegel commented 3 years ago

All these actions are legal because the environment is loaded with easy default game rules

With env.action_space._is_legal(action, obs._obs_env), we see that actions are always legal at each timestep. So they are all well done and undone.

NMegel commented 3 years ago

Added a correction in c9afedb9fc6148cb31ac8783cb58bc4fd6c44321 because env was generated twice but it doesnt change anything to the tests results

marota commented 3 years ago

All these actions are legal because the environment is loaded with easy default game rules

With env.action_space._is_legal(action, obs._obs_env), we see that actions are always legal at each timestep. So they are all well done and undone.

I see it now, but is not really an integration test anymore as the action path should rather be taken from the graph with DICT_GAME_PARAMETERS_GRAPH. It is a good unit test though

marota commented 3 years ago

It seems that if an different action is played at the same substation, a cancelling action would still be created according to

https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/core/agent/OracleAgent.py#L97

But it shouldn't, we are just changing the configuration at the same substation

What about this problem ?

NMegel commented 3 years ago

I am not sure to understand the problem.

Let's suppose we have 2 atomic actions on sub 5. Their representations are sub-5-0 and sub-5-1

For example if we wan't to apply:

['sub-5-0','sub-5-1'] at timestep t
['sub-5-1'] at timestep t+1

In this situation, OracleAgent will have to undo sub-5-0. This is what it does at line 97 when searching if there are previous atomic actions that don't belong to current actions