vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.54k stars 631 forks source link

DQN on MountainCar #255

Closed qsh-zh closed 2 years ago

qsh-zh commented 2 years ago
Details

Problem Description

Pytorch DQN fails on MountainCar. Try two settings in the issue

Checklist

Current Behavior

image

Expected Behavior

DQN should learn the policy.

Possible Solution

Not sure what can be done. Quite surprising that DQN fails on the simple env.

Steps to Reproduce

Modifications in hotfix are same as the issue

# DQN-hotfix
rb = ReplayBuffer(
        args.buffer_size,
        envs.single_observation_space,
        envs.single_action_space,
        device,
        handle_timeout_termination=False,
    )

# DQN
rb = ReplayBuffer(
        args.buffer_size,
        envs.single_observation_space,
        envs.single_action_space,
        device,
        handle_timeout_termination=True,
    )
vwxyzjn commented 2 years ago

Hello, thanks for reporting. Could you check if your performance match the reported performance in the docs? https://docs.cleanrl.dev/rl-algorithms/dqn/#experiment-results_1

Basically the performance is not that great as I had found it difficult to find a set of hyper parameters that work well for all three games we have tested.

qsh-zh commented 2 years ago

@vwxyzjn Thanks for your fast response. I think the performance almost matches what we have in the docs.

Except for the second random seed, seed=1/3 has a very similar behavior in my experiments~(never show the improvement compared with random policy).

Do you think the unsatisfying is due to suboptimal hyperparameters? Or DQN can not do well in the challenging env?

Thanks,

vwxyzjn commented 2 years ago

Yeah it is unsatisfactory. We always welcome new contributors! If you are interested in trying out https://github.com/vwxyzjn/cleanrl/pull/228 to find a set of params that work well for CartPole-v1, MountainCar-v0, and Acrobot-v1, that will be great.