openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.83k stars 4.88k forks source link

How to generater deterministic.ppo2...npz and stochastic.ppo2..npz as expert policy in gail #584

Open huangjiancong1 opened 6 years ago

huangjiancong1 commented 6 years ago

Because i want to use ppo2 or trpo to sample a random policy and use gail to imitation learning. Can you share some idea with me? Your help will be my great honor.

andrewliao11 commented 6 years ago

here's the data structure of the npz file:

{
    'ep_rets': np.array with shape (1500,), 
    'obs': np.array with shape (1500, T, O), 
    'rews': np.array with shape  (1500, T), 
    'acs': np.array with shape (1500, T, A)
}

where T, O, A represent time horizon, observation space, and action space respectively.

huangjiancong1 commented 6 years ago

Is it have a same structure with the models file generate by command: python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy --save_path=~/ppo2/models ?