Open marota opened 3 years ago
This is connected to this comment as well https://github.com/marota/Oracle4Grid/issues/24#issuecomment-805836046
For that, take an always changing action path from the graph and make sure to replay it well then We can use DICT_GAME_PARAMETERS_SIMULATION in replay for easier comparison. The rewards should be the same
When we go from action_Oracle_A to action_Oracle_B, delta_action=action3 When we go from action_Oracle_B to action_Oracle_A, delta_action=inverse(action3)=initial_configuration(asset3)
Done in commit e5eb05cde213ab96942b109fdbb94699c7eac5d2
A memory has been set in the OracleAgent with regard to whether the previous action was legal.
2 tests have been implemented
It seems that if an different action is played at the same substation, a cancelling action would still be created according to https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/core/agent/OracleAgent.py#L97 But it shouldn't, we are just changing the configuration at the same substation
In this test https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/test/integration_test.py#L378 Many actions might run illegal in this test as the action path is not path taken from the graph: https://github.com/marota/Oracle4Grid/blob/e5eb05cde213ab96942b109fdbb94699c7eac5d2/oracle4grid/test/integration_test.py#L404 we should probably check how many there are and if several actions are indeed implemented
All these actions are legal because the environment is loaded with easy default game rules
With env.action_space._is_legal(action, obs._obs_env), we see that actions are always legal at each timestep. So they are all well done and undone.
Added a correction in c9afedb9fc6148cb31ac8783cb58bc4fd6c44321 because env was generated twice but it doesnt change anything to the tests results
All these actions are legal because the environment is loaded with easy default game rules
With env.action_space._is_legal(action, obs._obs_env), we see that actions are always legal at each timestep. So they are all well done and undone.
I see it now, but is not really an integration test anymore as the action path should rather be taken from the graph with DICT_GAME_PARAMETERS_GRAPH. It is a good unit test though
It seems that if an different action is played at the same substation, a cancelling action would still be created according to
But it shouldn't, we are just changing the configuration at the same substation
What about this problem ?
I am not sure to understand the problem.
Let's suppose we have 2 atomic actions on sub 5. Their representations are sub-5-0 and sub-5-1
For example if we wan't to apply:
In this situation, OracleAgent will have to undo sub-5-0. This is what it does at line 97 when searching if there are previous atomic actions that don't belong to current actions
Let's have those two grid2op actions in the best path when playing the oracle: si grid2op_A= action_1 + action_2 et grid2op_B=action_1+action_2 + action_3
In the case we have grid2op_B at timestep t and grid2op_A at timestept+1, replay will play:
We need to test that. topo_vect from graph_node configurations and from replay agent should be the same at each timestep