Add control of timestepping

moriartyjm commented 2 years ago

We want to allow the human interaction to step both forwards and backwards in time. Steps to do this:

[x] Make state.step_count a controllable variable: by default it increments by 1, but the user can make it decrement by 1,2,...state.step_count
[x] Update the apply_action method: if the step decrements then use observations_all as a lookup for the past state and return this as the observation
[x] Update the reward so that if the step decrements to t_old, the total reward is reset to the total reward at step t_old

jia-chenhua commented 2 years ago

In https://github.com/rangl-labs/netzerotc/blob/d3de4f656ef39ede93a92d6a3b34395407ade50d/environment/reference_environment_direct_deployment/env.py#L294, we applied the multiplicative noise by https://github.com/rangl-labs/netzerotc/blob/d3de4f656ef39ede93a92d6a3b34395407ade50d/environment/reference_environment_direct_deployment/env.py#L337, i.e., multiplying the same noise from current year until 2050. If a user rewinds the state, to avoid over-multiplying the noise for future years, one has to either significantly modify the mechanism of randomise() function, or record all prices/costs of all years every time before or after randomise().

To avoid this complexity, I took another simpler approach as in https://github.com/rangl-labs/netzerotc/blob/d3de4f656ef39ede93a92d6a3b34395407ade50d/visualization/send_receive_osc_env.py#L89-L93: simply record the history of actions and truncate it by # of backward steps, reset the env, and then apply the truncated history of actions from the very beginning up to the rewound env.state.step_count, to re-produce the desired env.state.

moriartyjm commented 2 years ago

The alternative implementation makes sense!

rangl-labs / netzerotc