Closed EpicSpaces closed 4 years ago
@oussama00000, This question is better asked on StackOverflow since it is not a TensorFlow bug or feature request. There is also a larger community that reads questions there. Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.
Closing as stale. Please reopen if you'd like to work on this further.
import gym
import numpy as np
import tensorflow as tf
class Memory(object):
class ActorNetwork(object):
class ValueNetwork(object):
class PPO(object):
env = gym.make('Pendulum-v0') env.seed(1) env = env.unwrapped
agent = PPO(act_dim=env.action_space.shape[0], obs_dim=env.observation_space.shape[0], lr_actor=0.0004, lr_value=0.0003, gamma=0.9, clip_range=0.2)
nepisode = 1000 nstep = 200
for i_episode in range(nepisode): obs0 = env.reset() ep_rwd = 0
I've implemented proximal policy optimization on tensorflow, enviroment pendulum v0