rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.86k stars 310 forks source link

metaworld example for MT1 pick-place does not work #2167

Open guoyijie opened 3 years ago

guoyijie commented 3 years ago

For the code in examples/torch/mtsac_metaworld_mt1_pick_place.py, the policy is not able to learn a good policy. After 10e6 environment steps, the success rate is still 0 and the average return is always negative. Is it the expected result?

I installed the metaworld and garage with the command "pip install -e .[dev]"

ryanjulian commented 3 years ago

@avnishn

avnishn commented 3 years ago

@guoyijie the current set of V1 metaworld environments is highly sensitive to seed, so I wouldn't say its out of the question that MT1 pick and place got such low performance.

We're releasing the metaworld-v2 environments in mid november, and I expect to see a performance increase under different seeds.

Can you upload a tensorboard link, as well as try running the experiment once more?

Thanks! Avnish

ryanjulian commented 3 years ago

@guoyijie you can send us a tensorboard link by using the https://tensorboard.dev service.

Do you mind trying MT1-reach? That would be a more reliable indictation of a possible problem.

guoyijie commented 3 years ago

Thanks for your reply. Here I provide the additional information as required.

(1) "tensorboard link": Here is the log I got by running the script "mtsac_metaworld_mt1_pick_place.py" https://tensorboard.dev/experiment/1Y42H2DbRUWobvG4Esdd9w/

(2) "run the experiment once more": actually I tried to run the script several times (though with the same seed 1), the result is always the same that the policy is not able to get the positive success rate. Now I'm trying to run with a different seed, not sure whether it will work or not.

(3) "try MT1-reach": yes, I tried with MT1-reach. With the learned policy, this task can be easily solved with a success rate of 1.

Could you please let me know whether there is something wrong in the tensorboard log and any further suggestions?