Bugs in PPO - Githubissues

sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

MIT License

3.88k stars 844 forks source link

Bugs in PPO #6

Open moonblue333 opened 5 years ago

moonblue333 commented 5 years ago

1) counter

2) for index in BatchSampler(SubsetRandomSampler(range(self.buffer_capacity), self.batch_size, True)):

yuntao-ma commented 4 years ago

How to solve bug 2?
It seems that "done" from the env hasn't been dealt with. Why?

Thanks.

HuangHaoyu1997 commented 4 years ago

@yuntao-ma for index in BatchSampler(SubsetRandomSampler(range(self.buffer_capacity)), self.batch_size, True):

brezezee commented 4 years ago

Why can I train with this code to only get nan actions

xxx-007 commented 3 years ago

I get nan actions too

HzcIrving commented 3 years ago

I change the code to : for index in BatchSampler(SubsetRandomSampler(range(self.buffer_capacity)), self.batch_size, True): but there still exists a bug: Traceback (most recent call last): File "E:/AAAFor_PHD/UUV_SCI_Modif/UUV_obs_env/PPO2/Demo/PPO_demo.py", line 195, in <module> main() File "E:/AAAFor_PHD/UUV_SCI_Modif/UUV_obs_env/PPO2/Demo/PPO_demo.py", line 175, in main next_state, reward, done, info = env.step(action) File "F:\Anaconda\envs\Obstacle_Avoid\lib\site-packages\gym\envs\classic_control\pendulum.py", line 49, in step u = np.clip(u, -self.max_torque, self.max_torque)[0] IndexError: invalid index to scalar variable.

haohaoqian commented 9 months ago

Transition = namedtuple('Transition',['state', 'aciton', 'reward', 'a_log_prob', 'next_state']) 'aciton' should be 'action'

SMALLFISH-hub commented 1 month ago

I change the code to : for index in BatchSampler(SubsetRandomSampler(range(self.buffer_capacity)), self.batch_size, True): but there still exists a bug: Traceback (most recent call last): File "E:/AAAFor_PHD/UUV_SCI_Modif/UUV_obs_env/PPO2/Demo/PPO_demo.py", line 195, in <module> main() File "E:/AAAFor_PHD/UUV_SCI_Modif/UUV_obs_env/PPO2/Demo/PPO_demo.py", line 175, in main next_state, reward, done, info = env.step(action) File "F:\Anaconda\envs\Obstacle_Avoid\lib\site-packages\gym\envs\classic_control\pendulum.py", line 49, in step u = np.clip(u, -self.max_torque, self.max_torque)[0] IndexError: invalid index to scalar variable.

你好，我遇到了相同的报错，请问你后来解决了吗，谢谢