Open hariharan-jayakumar opened 4 years ago
Hi @gsurma,
Thank you for the wonderful code and the medium article. I tried implementing your code but found that the loss function in my model shoots off after some time.
These are the hyper-parameters I used:
env = MainGymWrapper.wrap(gym.make('SpaceInvaders-v0'))
total_step_limit = 5000000 wandb.config.episodes = 1000 GAMMA = 0.99 MEMORY_SIZE = 350000 BATCH_SIZE = 32 TRAINING_FREQUENCY = 4 TARGET_NETWORK_UPDATE_FREQUENCY = 40000 MODEL_PERSISTENCE_UPDATE_FREQUENCY = 10000 REPLAY_START_SIZE = 50000 action_size = env.action_space.n EXPLORATION_MAX = 1.0 EXPLORATION_MIN = 0.1 EXPLORATION_TEST = 0.02 EXPLORATION_STEPS = 425000 EXPLORATION_DECAY = (EXPLORATION_MAX-EXPLORATION_MIN)/EXPLORATION_STEPS wandb.config.batch_size = 32 wandb.config.learning_rate = 0.00025 input_shape = (4, 84, 84)
The CNN is the same. I also used np.sign for the rewards I got.
Can you guide me on what might be possibly going wrong?
Hi @gsurma,
Thank you for the wonderful code and the medium article. I tried implementing your code but found that the loss function in my model shoots off after some time.
These are the hyper-parameters I used:
initialize environment
env = MainGymWrapper.wrap(gym.make('SpaceInvaders-v0'))
env = gym.make('SpaceInvaders-v0')
define hyperparameters
total_step_limit = 5000000 wandb.config.episodes = 1000 GAMMA = 0.99 MEMORY_SIZE = 350000 BATCH_SIZE = 32 TRAINING_FREQUENCY = 4 TARGET_NETWORK_UPDATE_FREQUENCY = 40000 MODEL_PERSISTENCE_UPDATE_FREQUENCY = 10000 REPLAY_START_SIZE = 50000 action_size = env.action_space.n EXPLORATION_MAX = 1.0 EXPLORATION_MIN = 0.1 EXPLORATION_TEST = 0.02 EXPLORATION_STEPS = 425000 EXPLORATION_DECAY = (EXPLORATION_MAX-EXPLORATION_MIN)/EXPLORATION_STEPS wandb.config.batch_size = 32 wandb.config.learning_rate = 0.00025 input_shape = (4, 84, 84)
The CNN is the same. I also used np.sign for the rewards I got.
Can you guide me on what might be possibly going wrong?