openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.85k stars 4.88k forks source link

HER does not converge on simple envs that adhere to GoalEnv interface #428

Open avaziri opened 6 years ago

avaziri commented 6 years ago

System information

avaziri commented 6 years ago

ContinuousGoalOrientedMoveToPoint-v0 Environment

Env Summary:

The goal of this environment is to move the agent (a red dot) to touch the goal (a greev dot). The agent has momentum and will bounce off of walls if it touches one.

Gif of Environment

Please excuse the inconsistent frame rate, I was controlling the agent with a keyboard and did not hit the keys at a constant frequency. goal_oriented_move_to_point gif

Env Details

State Space: Positions in a unit square, velocities between -0.05 and +0.05. Goal Space: Any point in the unit square, re-sampled each episode Initial State Space: A point in the unit square, re-sampled each episode as well as a velocity within the velocity bounds re-sampled each episode Action Space: X force component, Y force component, between -1 and 1 Reward: 1 for reaching goal, 0 otherwise Terminal Condition: When the goal is achieved or when the time limit is reached (time limit comes from wrapper)

Source code

continuous_goal_oriented_particle.py.gz Make sure to register the environment with a time limit.

from gym.envs.registration import register

def register_move_to_point_env():
    register(
        id='ContinuousGoalOrientedMoveToPoint-v0',
        entry_point='baselines.her.experiment.continuous_goal_oriented_particle:ContinuousGoalOrientedMoveToPoint',
        max_episode_steps=250,
        reward_threshold=1,)
avaziri commented 6 years ago

@mandrychowicz and/or other developers I know it is not your job to solve random environments people propose to you, but I hope you can look into this one. I have spent a lot of time debugging it, and am only asking you after exhausting my own ideas. I feel that making sure this environment works is worth your time. It provides value to users by giving them (1) At least one goal oriented environment that does not require a mujoco license (2) A proof positive that the OpenAI gym GoalEnv interface is sufficient to make your own goal oriented environment.

I would love to use HER and continue to build on it at my company. Unfortunately this simple environment which was meant to be used for unit tests does not want to solve.

avaziri commented 6 years ago

Update: My coworker was able to solve the ContinuousGoalOrientedMoveToPoint environment with 100% success rate with TDM, which makes me more confident that this is a problem with Baselines HER and not the environment.

avaziri commented 6 years ago

I was unable to get HER to solve the ContinuousMoveToPointEnvironment with hyper-parameter tuning. I manually varied the following one at a time to no effect:

Here you can see a graph of the runs which are all noisy measurements with mean ~0.25 move_to_point_her_hyperparam

iSaran commented 6 years ago

@avaziri I also have some problems solving my environments with HER, but still I'm not sure if it's the HER's implementation fault, or mine. Did you have any progress with identifying the problem with your environment?

filipolszewski commented 6 years ago

Did you try to use -1 reward for not reaching a goal and 0 reward for reaching it? Also, your environment seems easier then FetchPush - maybe try to use 1 hidden layer network with 20-50 hidden neurons and batch size of 8, maybe 16? Try to run this with HER probability 0.0 as well. I am very interested what will happen.

Did you inspect how the agent moves around the env after several epochs of learning? I would be interested what is his behaviour - is it totally random, does it have any kind of pattern etc.