Closed lennardsnoeks closed 4 years ago
Can you provide some output logs?
Also, for the HyperOptSearch run, what if you increase num_samples to 20?
For PBT, this is the log, the parameters in the best config are always the same as the initial parameters:
== Status == Memory usage on this node: 12.5/15.6 GiB PopulationBasedTraining: 2 checkpoints, 0 perturbs Resources requested: 0/8 CPUs, 0/1 GPUs, 0.0/6.4 GiB heap, 0.0/2.2 GiB objects Result logdir: /home/lennard/ray_results/pbt Number of trials: 4 (4 TERMINATED) +--------------------------------+------------+-------+----------+------------------+-------+--------+ | Trial name | status | loc | reward | total time (s) | ts | iter | |--------------------------------+------------+-------+----------+------------------+-------+--------| | DDPG_single_agent_env_f14abc4e | TERMINATED | | 130.707 | 184.654 | 10000 | 10 | | DDPG_single_agent_env_f14b558c | TERMINATED | | 129.657 | 186.621 | 10000 | 10 | | DDPG_single_agent_env_f14bf5c8 | TERMINATED | | 124.356 | 193.27 | 10000 | 10 | | DDPG_single_agent_env_f14d4266 | TERMINATED | | 129.747 | 201.264 | 10000 | 10 | +--------------------------------+------------+-------+----------+------------------+-------+--------+
2020-02-28 16:20:10,879 INFO tune.py:352 -- Returning an analysis object by default. You can call
analysis.trials
to retrieve a list of trials. This message will be removed in future versions of Tune. Best config: {'num_workers': 5, 'num_envs_per_worker': 1, 'sample_batch_size': 1, 'batch_mode': 'truncate_episodes', 'num_gpus': 0, 'train_batch_size': 32, 'model': {'conv_filters': None, 'conv_activation': 'relu', 'fcnet_activation': 'tanh', 'fcnet_hiddens': [256, 256], 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action_reward': False, 'state_shape': None, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_action_dist': None, 'custom_options': {}, 'custom_preprocessor': None}, 'optimizer': {}, 'gamma': 0.95, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'env_config': {'sim_state': <utils.steerbench_parser.SimulationState object at 0x7fa0293a1710>, 'mode': 'hyper_param_opt', 'agent_id': 0, 'timesteps_per_iteration': 1000}, 'env': 'single_agent_env', 'normalize_actions': False, 'clip_rewards': None, 'clip_actions': True, 'preprocessor_pref': 'deepmind', 'lr': 0.0001, 'monitor': False, 'log_level': 'WARN', 'callbacks': {'on_episode_start': None, 'on_episode_step': None, 'on_episode_end': None, 'on_sample_end': None, 'on_train_result': None, 'on_postprocess_traj': None}, 'ignore_worker_failures': False, 'log_sys_usage': True, 'use_pytorch': False, 'eager': False, 'eager_tracing': False, 'no_eager_on_workers': False, 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'evaluation_interval': None, 'evaluation_num_episodes': 10, 'in_evaluation': False, 'evaluation_config': {'exploration_fraction': 0, 'exploration_final_eps': 0}, 'evaluation_num_workers': 0, 'custom_eval_function': None, 'sample_async': False, 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'compress_observations': False, 'collect_metrics_timeout': 180, 'metrics_smoothing_episodes': 100, 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'min_iter_time_s': 1, 'timesteps_per_iteration': 1000, 'seed': None, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, 'custom_resources_per_worker': {}, 'num_cpus_for_driver': 1, 'memory': 0, 'object_store_memory': 0, 'memory_per_worker': 0, 'object_store_memory_per_worker': 0, 'input': 'sampler', 'input_evaluation': ['is', 'wis'], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'multiagent': {'policies': {}, 'policy_mapping_fn': None, 'policies_to_train': None}, 'twin_q': False, 'policy_delay': 1, 'smooth_target_policy': False, 'target_noise': 0.2, 'target_noise_clip': 0.5, 'use_state_preprocessor': True, 'actor_hiddens': [64, 64], 'actor_hidden_activation': 'relu', 'critic_hiddens': [64, 64], 'critic_hidden_activation': 'relu', 'n_step': 1, 'exploration_should_anneal': True, 'schedule_max_timesteps': 100000, 'exploration_fraction': 1.0, 'exploration_final_scale': 0.02, 'exploration_noise_type': 'ou', 'exploration_ou_noise_scale': 0.1, 'exploration_ou_theta': 0.15, 'exploration_ou_sigma': 0.2, 'exploration_gaussian_sigma': 0.1, 'parameter_noise': False, 'pure_exploration_steps': 1000, 'buffer_size': 100000, 'prioritized_replay': True, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'final_prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'critic_lr': 0.001, 'actor_lr': 0.0001, 'target_network_update_freq': 0, 'tau': 0.001, 'use_huber': False, 'huber_threshold': 1.0, 'l2_reg': 1e-06, 'grad_norm_clipping': None, 'learning_starts': 1500, 'per_worker_exploration': False, 'worker_side_prioritization': False}
For HyperOptSearch, this is the log for the last iteration performed, which ends on an error. I did not have this error before (and the end analysis config always showed the initial parameters as the best config) but I upgraded to Tensorflow 2.0 and ray v0.8.2 and now it occurs. Here, just as with PBT, the best config always shows initial parameters chosen.
== Status == Memory usage on this node: 11.8/15.6 GiB Using AsyncHyperBand: num_stopped=0 Bracket: Iter 64.000: None | Iter 16.000: None | Iter 4.000: -1398.1349074847149 | Iter 1.000: nan Resources requested: 5/8 CPUs, 0/1 GPUs, 0.0/6.4 GiB heap, 0.0/2.2 GiB objects Result logdir: /home/lennard/ray_results/tpe Number of trials: 10 (1 RUNNING, 9 PENDING) +--------------------------------+----------+---------------------+-----------------+------------------+--------------------------+---------+----------------------+--------------------+-----------------------------+----------+------------------+------+--------+ | Trial name | status | loc | actor_hiddens | critic_hiddens | exploration_noise_type | gamma | observation_filter | train_batch_size | exploration_should_anneal | reward | total time (s) | ts | iter | |--------------------------------+----------+---------------------+-----------------+------------------+--------------------------+---------+----------------------+--------------------+-----------------------------+----------+------------------+------+--------| | DDPG_single_agent_env_1bb56148 | RUNNING | 192.168.0.103:16424 | (64, 64) | (64, 64) | ou | 0.95 | NoFilter | 32 | True | 12.8053 | 194.271 | 9000 | 9 | | DDPG_single_agent_env_1bb6743e | PENDING | | | | | | | | | | | | | | DDPG_single_agent_env_1bbb600c | PENDING | | | | | | | | | | | | | | DDPG_single_agent_env_1bc20a10 | PENDING | | | | | | | | | | | | | | DDPG_single_agent_env_1bc8ea10 | PENDING | | | | | | | | | | | | | | DDPG_single_agent_env_1bcf23c6 | PENDING | | | | | | | | | | | | | | DDPG_single_agent_env_1bd572b2 | PENDING | | | | | | | | | | | | | | DDPG_single_agent_env_1bd8efe6 | PENDING | | | | | | | | | | | | | | DDPG_single_agent_env_1bdd4bf4 | PENDING | | | | | | | | | | | | | | DDPG_single_agent_env_1be063fc | PENDING | | | | | | | | | | | | | +--------------------------------+----------+---------------------+-----------------+------------------+--------------------------+---------+----------------------+--------------------+-----------------------------+----------+------------------+------+--------+
Result for DDPG_single_agent_env_1bb56148: custom_metrics: {} date: 2020-02-28_15-57-22 done: true episode_len_mean: 151.56923076923076 episode_reward_max: 134.08692420351062 episode_reward_mean: 36.29173875416537 episode_reward_min: -3355.2853201998264 episodes_this_iter: 13 episodes_total: 65 experiment_id: d4535182866f490d9c073f4130211495 experiment_tag: 1_actor_hidden_activation=relu,actor_hiddens=(64, 64),actor_lr=0.0001,batch_mode=truncate_episodes,buffer_size=100000,on_episode_end=None,on_episode_start=None,on_episode_step=None,on_postprocess_traj=None,on_sample_end=None,on_train_result=None,clip_actions=True,clip_rewards=None,collect_metrics_timeout=180,compress_observations=False,critic_hidden_activation=relu,critic_hiddens=(64, 64),critic_lr=0.001,custom_eval_function=None,eager=False,eager_tracing=False,agent_id=0,mode=hyper_param_opt,sim_state=<utils.steerbench_parser.SimulationState object at 0x7ff4d499d150>,timesteps_per_iteration=1000,exploration_final_eps=0,exploration_fraction=0,evaluation_interval=None,evaluation_num_episodes=10,evaluation_num_workers=0,type=StochasticSampling,exploration_final_scale=0.02,exploration_fraction=1.0,exploration_gaussian_sigma=0.1,exploration_noise_type=ou,exploration_ou_noise_scale=0.1,exploration_ou_sigma=0.2,exploration_ou_theta=0.15,exploration_should_anneal=True,explore=True,final_prioritized_replay_beta=0.4,gamma=0.95,grad_norm_clipping=None,horizon=None,huber_threshold=1.0,ignore_worker_failures=False,in_evaluation=False,input=sampler,input_evaluation=['is', 'wis'],l2_reg=1e-06,learning_starts=1500,inter_op_parallelism_threads=8,intra_op_parallelism_threads=8,log_level=WARN,log_sys_usage=True,lr=0.0001,memory=0,memory_per_worker=0,metrics_smoothing_episodes=100,min_iter_time_s=1,conv_activation=relu,conv_filters=None,custom_action_dist=None,custom_model=None,custom_preprocessor=None,dim=84,fcnet_activation=tanh,fcnet_hiddens=[256, 256],framestack=True,free_log_std=False,grayscale=False,lstm_cell_size=256,lstm_use_prev_action_reward=False,max_seq_len=20,no_final_linear=False,state_shape=None,use_lstm=False,vf_share_layers=True,zero_mean=True,monitor=False,policies_to_train=None,policy_mapping_fn=None,n_step=1,no_done_at_end=False,no_eager_on_workers=False,normalize_actions=False,num_cpus_for_driver=1,num_cpus_per_worker=1,num_envs_per_worker=1,num_gpus=0,num_gpus_per_worker=0,num_workers=4,object_store_memory=0,object_store_memory_per_worker=0,observation_filter=NoFilter,output=None,output_compress_columns=['obs', 'new_obs'],output_max_file_size=67108864,parameter_noise=False,per_worker_exploration=False,policy_delay=1,postprocess_inputs=False,preprocessor_pref=deepmind,prioritized_replay=True,prioritized_replay_alpha=0.6,prioritized_replay_beta=0.4,prioritized_replay_eps=1e-06,pure_exploration_steps=1000,remote_env_batch_wait_ms=0,remote_worker_envs=False,sample_async=False,sample_batch_size=1,schedule_max_timesteps=100000,seed=None,shuffle_buffer_size=0,smooth_target_policy=False,soft_horizon=False,synchronize_filters=True,target_network_update_freq=0,target_noise=0.2,target_noise_clip=0.5,tau=0.001,allow_soft_placement=True,CPU=1,allow_growth=True,inter_op_parallelism_threads=2,intra_op_parallelism_threads=2,log_device_placement=False,timesteps_per_iteration=1000,train_batch_size=32,twin_q=False,use_huber=False,use_pytorch=False,use_state_preprocessor=True,worker_side_prioritization=False hostname: lennard-pc info: exploration_infos:
- 0.9117999999999999
- 0.9117999999999999
- 0.9117999999999999
- 0.9117999999999999
- 0.9117999999999999 grad_time_ms: 11.656 learner: default_policy: max_q: 12.72179126739502 mean_q: 0.6311368346214294 min_q: -25.07093048095703 num_steps_sampled: 10000 num_steps_trained: 68000 num_target_updates: 2500 opt_peak_throughput: 2745.345 opt_samples: 32.0 replay_time_ms: 4.69 sample_time_ms: 90.839 update_time_ms: 18.14 iterations_since_restore: 10 node_ip: 192.168.0.103 num_healthy_workers: 4 off_policy_estimator: {} perf: cpu_util_percent: 60.64333333333333 gpu_util_percent0: 0.0 ram_util_percent: 75.85999999999999 vram_util_percent0: 0.010479041916167664 pid: 16424 policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_env_wait_ms: 33.53771129821599 mean_inference_ms: 2.0699819800322556 mean_processing_ms: 0.8276820643794345 time_since_restore: 216.30549907684326 time_this_iter_s: 22.034335613250732 time_total_s: 216.30549907684326 timestamp: 1582901842 timesteps_since_restore: 10000 timesteps_this_iter: 1000 timesteps_total: 10000 training_iteration: 10 trial_id: 1bb56148
2020-02-28 15:57:22,590 ERROR trial_runner.py:513 -- Trial DDPG_single_agent_env_1bb56148: Error processing event. Traceback (most recent call last): File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 511, in _process_trial self._execute_action(trial, decision) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 595, in _execute_action self.trial_executor.stop_trial(trial) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 263, in stop_trial trial, error=error, error_msg=error_msg, stop_logger=stop_logger) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 204, in _stop_trial trial.close_logger() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial.py", line 315, in close_logger self.result_logger.close() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/logger.py", line 305, in close _logger.close() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/logger.py", line 233, in close self._try_log_hparams(self.last_result) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/logger.py", line 244, in _try_log_hparams hparam_dict=scrubbed_params, metric_dict=result) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorboardX/summary.py", line 102, in hparams v = make_np(v)[0] File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorboardX/x2num.py", line 34, in make_np 'Got {}, but expected numpy array or torch tensor.'.format(type(x))) NotImplementedError: Got <class 'tuple'>, but expected numpy array or torch tensor. Traceback (most recent call last): File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 511, in _process_trial self._execute_action(trial, decision) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 595, in _execute_action self.trial_executor.stop_trial(trial) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 263, in stop_trial trial, error=error, error_msg=error_msg, stop_logger=stop_logger) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 204, in _stop_trial trial.close_logger() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial.py", line 315, in close_logger self.result_logger.close() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/logger.py", line 305, in close _logger.close() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/logger.py", line 233, in close self._try_log_hparams(self.last_result) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/logger.py", line 244, in _try_log_hparams hparam_dict=scrubbed_params, metric_dict=result) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorboardX/summary.py", line 102, in hparams v = make_np(v)[0] File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorboardX/x2num.py", line 34, in make_np 'Got {}, but expected numpy array or torch tensor.'.format(type(x))) NotImplementedError: Got <class 'tuple'>, but expected numpy array or torch tensor.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/lennard/DRL_Crowd_Simulation/crowd-sim-RL/simulations/hyper_param_opti_tpe.py", line 81, in
main() File "/home/lennard/DRL_Crowd_Simulation/crowd-sim-RL/simulations/hyper_param_opti_tpe.py", line 17, in main train(sim_state) File "/home/lennard/DRL_Crowd_Simulation/crowd-sim-RL/simulations/hyper_param_opti_tpe.py", line 75, in train analysis = run("DDPG", name="tpe", num_samples=20, search_alg=search, scheduler=scheduler, stop=stop, config=config) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/tune.py", line 324, in run runner.step() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 335, in step self._process_events() # blocking File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 444, in _process_events self._process_trial(trial) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 514, in _process_trial self._process_trial_failure(trial, traceback.format_exc()) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 580, in _process_trial_failure trial, error=True, error_msg=error_msg) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 263, in stop_trial trial, error=error, error_msg=error_msg, stop_logger=stop_logger) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 204, in _stop_trial trial.close_logger() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial.py", line 315, in close_logger self.result_logger.close() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/logger.py", line 305, in close _logger.close() File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/logger.py", line 233, in close self._try_log_hparams(self.last_result) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/logger.py", line 244, in _try_log_hparams hparam_dict=scrubbed_params, metric_dict=result) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorboardX/summary.py", line 102, in hparams v = make_np(v)[0] File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorboardX/x2num.py", line 34, in make_np 'Got {}, but expected numpy array or torch tensor.'.format(type(x))) NotImplementedError: Got <class 'tuple'>, but expected numpy array or torch tensor.
Also, HParams is not active in tensorboard for the HyperOptSearch, with this message displayed in terminal:
hparams_plugin.py:104] HParams error: Can't find an HParams-plugin experiment data in the log directory. Note that it takes some time to scan the log directory; if you just started Tensorboard it could be that we haven't finished scanning it yet. Consider trying again in a few seconds.
**EDIT This error occurs when "exploration_noise_type" is set to "gaussian".
Also, don't know if its fully relevant, but sometimes the PBT test errors on the last trial, when the others have completed, and gives the following error:
Failure # 1 (occurred at 2020-02-28_17-01-29) Traceback (most recent call last): File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 459, in _process_trial result = self.trial_executor.fetch_result(trial) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 377, in fetch_result result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/worker.py", line 1504, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): [36mray::DDPG.init()[39m (pid=24923, ip=192.168.0.103) File "python/ray/_raylet.pyx", line 437, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 437, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 450, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 430, in ray._raylet.execute_task.function_executor File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 86, in init Trainer.init(self, config, env, logger_creator) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 447, in init super().init(config, logger_creator) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/tune/trainable.py", line 172, in init self._setup(copy.deepcopy(self.config)) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 591, in _setup self._init(self.config, self.env_creator) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 105, in _init self.config["num_workers"]) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 658, in _make_workers logdir=self.logdir) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 60, in init RolloutWorker, env_creator, policy, 0, self._local_config) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 262, in _make_worker _fake_sampler=config.get("_fake_sampler", False)) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 355, in init self._build_policy_map(policy_dict, policy_config) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 820, in _build_policy_map policy_map[name] = cls(obs_space, act_space, merged_conf) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/ray/rllib/agents/ddpg/ddpg_policy.py", line 131, in init exploration_sample = tf.get_variable(name="ornstein_uhlenbeck") File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorflow_core/python/ops/variable_scope.py", line 1504, in get_variable aggregation=aggregation) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorflow_core/python/ops/variable_scope.py", line 1247, in get_variable aggregation=aggregation) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorflow_core/python/ops/variable_scope.py", line 567, in get_variable aggregation=aggregation) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorflow_core/python/ops/variable_scope.py", line 519, in _true_getter aggregation=aggregation) File "/home/lennard/anaconda3/envs/env1/lib/python3.7/site-packages/tensorflow_core/python/ops/variable_scope.py", line 886, in _get_single_variable "reuse=tf.AUTO_REUSE in VarScope?" % name) ValueError: Variable default_policy/action/ornstein_uhlenbeck does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?
Seems to be working now, think it was an environment related bug.
What is your question?
Is it not possible to tune the parameters of DDPG with Population based training in the following way? (based on https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/pbt_ppo_example.py):
The resulting config always contains the initial set variables. I also tried HyperOptSearch in the following way (based on https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/hyperopt_example.py):
Here, the parameters to optimize never change (seen in print statement each iteration), and the resulting config is always the same as the parameters set in _current_bestparams. There also is no HParams output for tensorboard, which was present for the HyperOptSearch example (which utilizes a custom model).
Ray version and other system information (Python version, TensorFlow version, OS): python v3.7.4 ray v0.8.2 tensorflow v2.0.0 tensborboard 2.0.2