Max Reward log:
The idea behind this, is that now if an agent is actually getting better, we should see the max rewards and value go higher overtime, even if its a 0 sum game.
QoL was also added in the format of function arguments. You can now call main.py with -h for help, or with --starting_episode for telling the program which checkpoint to load (if 0, it does not load anything)
Signed-off-by: lobotuerk tomas.lobo.it@gmail.com
Max Reward log: The idea behind this, is that now if an agent is actually getting better, we should see the max rewards and value go higher overtime, even if its a 0 sum game. QoL was also added in the format of function arguments. You can now call main.py with -h for help, or with --starting_episode for telling the program which checkpoint to load (if 0, it does not load anything)