Closed quangr closed 1 year ago
I can't find a way to make ppo compariable to tianshou benchmark, especially in half-cheetah env, where we can't acheive half of score..
Benchmark:
Tianshou: Hopper-v3: 2609.3+-700.8 Half-Cheetah-v3: 5783.9+-1244.0
My: Hopper-v3:1683+-307 Half-Cheetah-v3: 1926+-254
Where goes wrong?
So far I have test following assumption
Result: add masking in ppo step and make using the value bootstrap not improve much
Result: change different version won't help.
Result: Setting learning at a constant result or setting total step to 3m not improve much.
Result: copy remap method from tianshou, still not work
Result: When Use exact data from tianshou, the loss produced by them are same.
Result: Don't know how to test this.
It turns out that we need a observation normalizer
I can't find a way to make ppo compariable to tianshou benchmark, especially in half-cheetah env, where we can't acheive half of score..
Benchmark:
Tianshou: Hopper-v3: 2609.3+-700.8 Half-Cheetah-v3: 5783.9+-1244.0
My: Hopper-v3:1683+-307 Half-Cheetah-v3: 1926+-254
Where goes wrong?
So far I have test following assumption
Result: add masking in ppo step and make using the value bootstrap not improve much
Result: change different version won't help.
Result: Setting learning at a constant result or setting total step to 3m not improve much.
Result: copy remap method from tianshou, still not work
Result: When Use exact data from tianshou, the loss produced by them are same.
Result: Don't know how to test this.
It turns out that we need a observation normalizer