uidilr gail_ppo_tf issues

uidilr / gail_ppo_tf

Tensorflow implementation of Generative Adversarial Imitation Learning(GAIL) with discrete action

MIT License

112 stars 29 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

gaes = (gaes - gaes.mean()) / gaes.std()

#22 Joll123 closed 4 years ago
1
ppo implementation and kl divergence of gail paper

#21 mehdimashayekhi closed 6 years ago
1
roll out of the policy and collecting data, especially reward

#20 mehdimashayekhi closed 6 years ago
1
no render for cartpole-v0

#19 navuboy closed 6 years ago
1
is there any way that I can train the Generator with batches from previous roll outs (something like reply buffer) ?

#18 shamanez closed 6 years ago
1
Missing (-1) in the loss function in BC

#17 shamanez closed 6 years ago
1
Is there any way we can encourage the exploration strategy of the agent ?

#16 shamanez closed 6 years ago
1
can you implement WGAN loss as the discriminator loss ?

#15 shamanez closed 6 years ago
1
Where do you use the get_grad function ?

#14 shamanez closed 6 years ago
1
Visualize the Gradients in the discriminator on the Tensor-board

#13 shamanez closed 6 years ago
4
Discriminator statistics are not storing on the tensor board

#12 shamanez closed 6 years ago
1
Application of Batch Normalization and Drop Out

#11 shamanez closed 6 years ago
1
Discriminator Training with the expert's trajectories ?

#10 shamanez closed 6 years ago
1
GAIL Training - Train the policy net fist with behavioural cloning and fine tune with the GAIL ?

#9 shamanez closed 6 years ago
4
Your implementation is different from the Gail paper

#8 Guiliang closed 6 years ago
4
What is "temp" value used in getting the output of policy network ?

#7 shamanez closed 6 years ago
1
Discriminator optimisation with each [state,action]

#6 shamanez closed 6 years ago
6
Discriminator optimisation in GAIL

#5 shamanez closed 6 years ago
1
About the old an new policy ?

#4 shamanez closed 6 years ago
1
About the critic network optimisation ?

#3 shamanez closed 6 years ago
2
How did you implement proximal Policy Gradients in run_ppo.py?

#2 shamanez closed 6 years ago
2
some issue about the set of 'stochastic'

#1 LinBornRain closed 6 years ago
4