DQN on MountainCar - Githubissues

qsh-zh commented 2 years ago

Details

Problem Description

Pytorch DQN fails on MountainCar. Try two settings in the issue

Checklist

[x] I have installed dependencies via poetry install (see CleanRL's installation guideline.
[x] I have checked that there is no similar issue in the repo (required)

Current Behavior

Expected Behavior

DQN should learn the policy.

Possible Solution

Not sure what can be done. Quite surprising that DQN fails on the simple env.

Steps to Reproduce

Modifications in hotfix are same as the issue

# DQN-hotfix
rb = ReplayBuffer(
        args.buffer_size,
        envs.single_observation_space,
        envs.single_action_space,
        device,
        handle_timeout_termination=False,
    )

# DQN
rb = ReplayBuffer(
        args.buffer_size,
        envs.single_observation_space,
        envs.single_action_space,
        device,
        handle_timeout_termination=True,
    )

vwxyzjn commented 2 years ago

Hello, thanks for reporting. Could you check if your performance match the reported performance in the docs? https://docs.cleanrl.dev/rl-algorithms/dqn/#experiment-results_1

Basically the performance is not that great as I had found it difficult to find a set of hyper parameters that work well for all three games we have tested.

qsh-zh commented 2 years ago

@vwxyzjn Thanks for your fast response. I think the performance almost matches what we have in the docs.

Except for the second random seed, seed=1/3 has a very similar behavior in my experiments~(never show the improvement compared with random policy).

Do you think the unsatisfying is due to suboptimal hyperparameters? Or DQN can not do well in the challenging env?

Thanks,