nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

Convolutional? #51

Closed Bobingstern closed 2 years ago

Bobingstern commented 2 years ago

I want to use this for Atari games but I'm unsure about how to change it to use CNN layers. Can I just literally change the actor and critic models to have a Conv2d layer or do I need to change the replay buffer for multi-dimensional arrays?

nikhilbarhate99 commented 2 years ago

You might have to (not sure) change the buffer to handle the images and add Conv2d layers in the actor and critic models. Also to get RL algos (which have not been tested on atari) to train on atari takes quite a bit of work.

You can still first try it on a simple atari game like pong or on another similar env like minAtar

Also you can use some tricks like passing the difference of observation images to the model. refer to Andrej Karpathys blog on Deep RL for pong ... although he implements everything from scratch you can use PyTorch and try to use the same tricks.