qgallouedec / panda-gym

Set of robotic environments based on PyBullet physics engine and gymnasium.
MIT License
507 stars 109 forks source link

ValueError: high <= 0 #15

Closed learningxiaobai closed 2 years ago

learningxiaobai commented 2 years ago

Hello, When I change the max_episode_steps=300 in register and being trained with tqc in sb3 ,I met this error,what is the problems? Thanks. python train.py --algo tqc --env PandaStack-v1 -params n_envs:10 ========== PandaStack-v1 ========== Seed: 3400246078 Default hyperparameters for environment (ones being tuned will be overridden): OrderedDict([('batch_size', 1024), ('buffer_size', 1000000), ('env_wrapper', 'sb3_contrib.common.wrappers.TimeFeatureWrapper'), ('gamma', 0.95), ('learning_rate', 0.001), ('learning_starts', 1000), ('n_envs', 10), ('n_timesteps', 30000000000.0), ('policy', 'MultiInputPolicy'), ('policy_kwargs', 'dict(net_arch=[512, 512, 512], n_critics=2)'), ('replay_buffer_class', 'HerReplayBuffer'), ('replay_buffer_kwargs', "dict( online_sampling=True, goal_selection_strategy='future', " 'n_sampled_goal=4, )'), ('tau', 0.05)]) Using 10 environments Creating test environment pybullet build time: Nov 2 2021 15:42:29 argv[0]= C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\gym\logger.py:34: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow")) argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= argv[0]= Using cuda device Log path: logs/tqc/PandaStack-v1_6 Traceback (most recent call last): File "train.py", line 195, in <module> exp_manager.learn(model) File "C:\codes\rl-baselines3-zoo-master\utils\exp_manager.py", line 202, in learn model.learn(self.n_timesteps, **kwargs) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\sb3_contrib\tqc\tqc.py", line 299, in learn reset_num_timesteps=reset_num_timesteps, File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 375, in learn self.train(batch_size=self.batch_size, gradient_steps=gradient_steps) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\sb3_contrib\tqc\tqc.py", line 194, in train replay_data = self.replay_buffer.sample(batch_size, env=self._vec_normalize_env) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 652, in sample samples.append(self.buffers[i].sample(int(batch_sizes[i]), env)) File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 212, in sample return self._sample_transitions(batch_size, maybe_vec_env=env, online_sampling=True) # pytype: disable=bad-return-type File "C:\ProgramData\Anaconda3\envs\robot_gym\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 295, in _sample_transitions episode_indices = np.random.randint(0, self.n_episodes_stored, batch_size) File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint File "_bounded_integers.pyx", line 1338, in numpy.random._bounded_integers._rand_int32 ValueError: high <= 0

learningxiaobai commented 2 years ago

change something works.