Right now different games seem to break our learning rate parameter (TODO: Confirm this is because you changed the game, and not because you changed the batch sizes). Should we normalize in some way (e..g, divide values and prices by the budget) to keep everything in roughly the same scale?
Right now different games seem to break our learning rate parameter (TODO: Confirm this is because you changed the game, and not because you changed the batch sizes). Should we normalize in some way (e..g, divide values and prices by the budget) to keep everything in roughly the same scale?