nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.63k stars 340 forks source link

PPO instead of PPO-M #20

Closed murtazabasu closed 4 years ago

murtazabasu commented 4 years ago

Hi, I am following your code for my implementation. But after seeing the results (which are good by the way) I want to give a shot with PPO. I know there are some implementations already in other repos but I find this one pretty easy to follow. I just want to ask what kind of modification I would need to do in your code for the PPO implementation? And is it recommended to do modifications in this code or follow other repo? There are some good repos out there but the problem is that they are specially made for ATARI and MUJOCO environments using baselines from deepmind which are difficult to modify for my environment and also I need Python 3.5+ to work with the baselines. But since I am working with ROS Melodic which comes with default Python 2.7 I can't really use those baselines. Any suggestions would be grateful.

nikhilbarhate99 commented 4 years ago

Hey, I would not recommend using this repo for complicated environments. It is very simplified version for understanding / learning PPO. I would suggest you write an gym api for your env and use other repos instead of trying to modify them.

murtazabasu commented 4 years ago

That would be an option but as I mentioned before I am working with ROS which comes with Python 2.7 version and so writing a gym API wouldn't really help as I would still need to use the "baselines" from deepmind for VecNormalizing the Env which is only possible with Python 3.5+.