sfujim / BCQ

Author's PyTorch implementation of BCQ for continuous and discrete actions
MIT License
599 stars 139 forks source link

Can you please post software packages configuration? #2

Closed quanvuong closed 5 years ago

quanvuong commented 5 years ago

Thank you for the clean codebase!

I'm trying to reproduce the results in the paper, but I'm unable to obtain an expert with good performance while running the code as is.

I suspect it's because of software version mismatch. Can you please post the software packages configuration that you used to run the code?

Thank you!

The packages in my conda environment can be found below. With these package versions, I was only able to obtain a policy with episode return approximately 230 on Hopper-v1, which is quite low.

conda_packages.pdf

sfujim commented 5 years ago

Hey, thanks for reporting this! I'm not sure the exact software now tbh, but I will check this out on my end and see if it works. I was also planning on updating some of the code to python 3 soon, and making sure everything still works at that point as well.

Did you try more than 1 seed? DDPG is rather unstable and we took the best X out of Y seeds to use as the expert for some of the experiments in the paper.

quanvuong commented 5 years ago

Thank you for getting back to me!

Let me try other seed values : ).

DanielTakeshi commented 4 years ago

In particular, I'm wondering if the MuJoCo version used can be posted here, both for OpenAI's mujoco-py package and the MuJoCo downloaded from Todorov? It looks like using MuJoCo 2.0 causes some issues. https://github.com/openai/gym/issues/1541

sfujim commented 4 years ago

Sorry for the late reply. Unfortunately, the exact versions was not something I recorded. However, doing some detective work, the original experiments were run in python 2, meaning the MuJoCo version was likely 1.31 and the mujoco-py version was likely 0.5.7.

sfujim commented 4 years ago

I haven't tested whether the MuJoCo versions matter with the current iteration of the code, but nobody has reported any other issues so hopefully not :^)

quanvuong commented 4 years ago

You're so kind to answer question on research code base over an extended period of time Scott Fujimoto : ).