Closed moriartyjm closed 2 years ago
In https://github.com/rangl-labs/netzerotc/blob/d3de4f656ef39ede93a92d6a3b34395407ade50d/environment/reference_environment_direct_deployment/env.py#L294, we applied the multiplicative noise by https://github.com/rangl-labs/netzerotc/blob/d3de4f656ef39ede93a92d6a3b34395407ade50d/environment/reference_environment_direct_deployment/env.py#L337, i.e., multiplying the same noise from current year until 2050. If a user rewinds the state, to avoid over-multiplying the noise for future years, one has to either significantly modify the mechanism of randomise()
function, or record all prices/costs of all years every time before or after randomise()
.
To avoid this complexity, I took another simpler approach as in https://github.com/rangl-labs/netzerotc/blob/d3de4f656ef39ede93a92d6a3b34395407ade50d/visualization/send_receive_osc_env.py#L89-L93: simply record the history of actions and truncate it by # of backward steps, reset the env
, and then apply the truncated history of actions from the very beginning up to the rewound env.state.step_count
, to re-produce the desired env.state
.
The alternative implementation makes sense!
We want to allow the human interaction to step both forwards and backwards in time. Steps to do this:
state.step_count
a controllable variable: by default it increments by 1, but the user can make it decrement by 1,2,...state.step_count
apply_action
method: if the step decrements then useobservations_all
as a lookup for the past state and return this as the observationreward
so that if the step decrements tot_old
, the total reward is reset to the total reward at stept_old