Open huangjiancong1 opened 6 years ago
here's the data structure of the npz file:
{
'ep_rets': np.array with shape (1500,),
'obs': np.array with shape (1500, T, O),
'rews': np.array with shape (1500, T),
'acs': np.array with shape (1500, T, A)
}
where T
, O
, A
represent time horizon, observation space, and action space respectively.
Is it have a same structure with the models
file generate by command:
python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy --save_path=~/ppo2/models
?
Because i want to use ppo2 or trpo to sample a random policy and use gail to imitation learning. Can you share some idea with me? Your help will be my great honor.