''python example/trpo_swimmer.py'' works well. In the default setting, after 40 iterations it produces 55.72 average reward.
When I try to run trpo_swimmer.py in the ''stub'' mode (I simply add ''stub(globals())'' at the begining and replace ''algo.train()'' with ''run_experiment_lite(...)" just following ddpg_cartpole and ddpg_cartpole_stub), it still work. However, in the same default setting, it produces 49.59 average reward. I try different random SEED the difference remained.
''python example/trpo_swimmer.py'' works well. In the default setting, after 40 iterations it produces 55.72 average reward.
When I try to run trpo_swimmer.py in the ''stub'' mode (I simply add ''stub(globals())'' at the begining and replace ''algo.train()'' with ''run_experiment_lite(...)" just following ddpg_cartpole and ddpg_cartpole_stub), it still work. However, in the same default setting, it produces 49.59 average reward. I try different random SEED the difference remained.
I'm wondering why the difference exists?