Closed riccardodv closed 2 years ago
https://github.com/rlberry-py/rlberry/blob/014fcd38b13d09abd61ed55ea6bbd357c25a33d7/rlberry/agents/torch/a2c/a2c.py#L226-L227
shouldn't we break? Like it is done some lines above in:
https://github.com/rlberry-py/rlberry/blob/014fcd38b13d09abd61ed55ea6bbd357c25a33d7/rlberry/agents/torch/a2c/a2c.py#L221-L222
https://github.com/rlberry-py/rlberry/blob/014fcd38b13d09abd61ed55ea6bbd357c25a33d7/rlberry/agents/torch/a2c/a2c.py#L254-L256
Why do we normalize rewards in A2C?
https://github.com/rlberry-py/rlberry/blob/014fcd38b13d09abd61ed55ea6bbd357c25a33d7/rlberry/agents/torch/a2c/a2c.py#L226-L227
shouldn't we break? Like it is done some lines above in:
https://github.com/rlberry-py/rlberry/blob/014fcd38b13d09abd61ed55ea6bbd357c25a33d7/rlberry/agents/torch/a2c/a2c.py#L221-L222