Closed regproj closed 3 years ago
@regproj Have you solved this issue? I am seeing the same problem when I use noisy nets ("noisy"=True): I trained the blue one with e-greedy and the pink one with "noisy"=True. It is worth noting that my configs are exactly the same, except one has "noisy"=True.
"eager": True,
"use_exec_api": False,
# Model
"num_atoms": 51, # Distributional RL
"v_min": -200, # Distributional RL
"v_max": 200, # Distributional RL
"noisy": True, # Noisy Nets
"sigma0": 0.1, # Noisy Nets (Rainbow paper says use 0.5 if GPU, 0.1 if CPU
"dueling": True, # Dueling Network Architecture
"double_q": True, # Double DQN (DDQN)
"hiddens": [2*51], # num_actions * num_atoms
"n_step": 3, # N-step Q Learning / Multi-step Returns
# Exploration
# === Exploration Settings (Experimental) ===
"exploration_config": {
# The Exploration class to use.
"type": "EpsilonGreedy",
# Config for the Exploration class' constructor:
"initial_epsilon": 1.0,
"final_epsilon": 0.0, #"exploration_final_eps", in older version
"epsilon_timesteps": 200000, # Timesteps over which to anneal epsilon. "exploration_fraction": in older version
# For soft_q, use:
# "exploration_config" = {
# "type": "SoftQ"
# "temperature": [float, e.g. 1.0]
# }
},
# "schedule_max_timesteps": 2000000, # 2e6
# "exploration_fraction": 0.01, # Not needed when using Noisy Nets
# "exploration_final_eps": 0.0, # Not needed when using Noisy Nets
"target_network_update_freq": 8192, # DQN
# "soft_q": False,
# "softmax_temp": 1.0,
# "parameter_noise": False, # This is NOT Noisy Nets
# Replay buffer
"buffer_size": 500000, # 5e5 # DQN
"prioritized_replay": True, # Prioritized Experience Replay
"prioritized_replay_alpha": 0.5, # Prioritized Experience Replay
"prioritized_replay_beta": 0.4, # Prioritized Experience Replay
# "beta_annealing_fraction": 1.0, # Prioritized Experience Replay
"prioritized_replay_beta_annealing_timesteps": 2000000,
"final_prioritized_replay_beta": 1.0, # Prioritized Experience Replay
"prioritized_replay_eps": 1e-6, # Prioritized Experience Replay
"compress_observations": True,
# Optimization
"gamma": 0.99,
"lr": 1e-4,
# "lr_schedule": None,
"adam_epsilon": 1.5e-4,
# "grad_clip": 40,
"learning_starts": 20000,
"rollout_fragment_length": 4,
"train_batch_size": 32,
"timesteps_per_iteration": 200,
# Parallelism
"num_workers": 0,
# "optimizer_class": "SyncReplayOptimizer",
# "per_worker_exploration": False,
"worker_side_prioritization": False,
"min_iter_time_s": 1.,
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
What is the problem?
I just wanted to run some basic tests for DQN on the cartpole environment to check things over before running it on my own environment. I'm wondering if I somehow set up the parameters wrong, as it doesn't seem to be learning. Ray version and other system information (Python version, TensorFlow version, OS): Ray 0.8.4 Tf 1.14.0 Python 3.6 Ubuntu 18.04
Reproduction (REQUIRED)
Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):
If we cannot run your script, we cannot fix your issue.