sush1996 / DDPG_Fetch

Exploring the performance of Prioritized Experience Replay (PER) with the DDPG+HER scheme on the Fetch Robotics Environemnt
16 stars 2 forks source link

What the performance of PER + HER + DDPG scheme in Fetch? #1

Closed kaixindelele closed 3 years ago

kaixindelele commented 3 years ago

Hi, I also want to combine PER+HER with DDPG, because in Fetch Envs, agent cannot reach the cube for earlier exploration, it is really waste time. So, does the PER can promote the performance for the DDPG ?

sush1996 commented 3 years ago

The performance gain when using PER with the DDPG+HER scheme is negligible for FetchReach-v1, but does slightly better on FetchPickandPlace-v1 and does a lot better for FetchSlide-v1. I shall update the the repo with the plots soon

kaixindelele commented 3 years ago

Thank you for your reply. I run your code about 400 epochs for FetchPush-v1, the eval performance is :

[2021-02-08 09:57:31.051204] epoch is: 393, eval success rate is: 0.300 [2021-02-08 10:00:08.442811] epoch is: 394, eval success rate is: 0.500 [2021-02-08 10:02:41.274267] epoch is: 395, eval success rate is: 0.200 [2021-02-08 10:05:16.168315] epoch is: 396, eval success rate is: 0.400 [2021-02-08 10:07:52.791192] epoch is: 397, eval success rate is: 0.100 [2021-02-08 10:10:25.656766] epoch is: 398, eval success rate is: 0.400 [2021-02-08 10:12:56.899515] epoch is: 399, eval success rate is: 0.100 [2021-02-08 10:15:29.850096] epoch is: 400, eval success rate is: 0.100 [2021-02-08 10:17:31.082211] epoch is: 401, eval success rate is: 0.200 [2021-02-08 10:19:18.113382] epoch is: 402, eval success rate is: 0.200 [2021-02-08 10:21:57.569483] epoch is: 403, eval success rate is: 0.200

but for the origin baselines library, for almost fetch envs, 300 epochs can learn 0.6~1.0 success rate with single process, single env.

I render the process, and invest into the re-reward function, At the beginning of study, the agent is can not touch the cube, so the cube position is fixed from beginning to end. No matter how you modify the goal, it thinks that the current position is the best, and it is perfect without touching the cube. This kind of data accounts for 80% of them, and the remaining 20% is the original sparse reward, which makes it impossible for agents to learn good policy.

I I used to think about whether PERcan enlarge the effect of a few of data which contact cube, but now I find that the effect even is not as good as baselines~

Maybe I missed some details?

sush1996 commented 3 years ago

Oh that's interesting. It's been a while since I ran this program and I am unable to run in on my system now due to some issues with mujoco_py path on bashrc. Once I resolve it, I can try simulating it and I can get back to you - I'll get on this soon. I've put up snapshots of the plots, you can see that for FetchPush-v1, you can converge in just about 50 epochs.

I've uploaded the final models for each environment, I honestly don't remember what their performance was like. But I've uploaded it anyway, in case you find it helpful.

kaixindelele commented 3 years ago

Some bugs for mujoco_py?Maybe I can help you~ This is my args:

def get_args(): parser = argparse.ArgumentParser()

the environment setting

parser.add_argument('--env-name', type=str, default='FetchPush-v1', help='the environment name') parser.add_argument('--n-epochs', type=int, default=2000, help='the number of epochs to train the agent') parser.add_argument('--n-cycles', type=int, default=19, help='the times to collect samples per epoch') parser.add_argument('--n-batches', type=int, default=40, help='the times to update the network') parser.add_argument('--save-interval', type=int, default=5, help='the interval that save the trajectory') parser.add_argument('--seed', type=int, default=123, help='random seed') parser.add_argument('--num-workers', type=int, default=1, help='the number of cpus to collect samples') parser.add_argument('--replay-strategy', type=str, default='future', help='the HER strategy') parser.add_argument('--clip-return', type=float, default=50, help='if clip the returns') parser.add_argument('--save-dir', type=str, default='saved_models/', help='the path to save the models') parser.add_argument('--noise-eps', type=float, default=0.2, help='noise eps') parser.add_argument('--random-eps', type=float, default=0.3, help='random eps') parser.add_argument('--buffer-size', type=int, default=int(1e6), help='the size of the buffer') parser.add_argument('--replay-k', type=int, default=4, help='ratio to be replace') parser.add_argument('--clip-obs', type=float, default=200, help='the clip ratio') parser.add_argument('--batch-size', type=int, default=256, help='the sample batch size') parser.add_argument('--gamma', type=float, default=0.98, help='the discount factor') parser.add_argument('--action-l2', type=float, default=1, help='l2 reg') parser.add_argument('--lr-actor', type=float, default=0.001, help='the learning rate of the actor') parser.add_argument('--lr-critic', type=float, default=0.001, help='the learning rate of the critic') parser.add_argument('--polyak', type=float, default=0.95, help='the average coefficient') parser.add_argument('--n-test-rollouts', type=int, default=10, help='the number of tests') parser.add_argument('--clip-range', type=float, default=5, help='the clip range') parser.add_argument('--demo-length', type=int, default=20, help='the demo length') parser.add_argument('--cuda', type=bool, default=False, help='if use gpu do the acceleration') parser.add_argument('--num-rollouts-per-mpi', type=int, default=2, help='the rollouts per mpi') parser.add_argument('--her', type=bool, default=True, help='is HER True or False') parser.add_argument('--per', type=bool, default=True, help='is PER True or False')

If the agent can converge in just about 50 epochs, that is really perfect!

sush1996 commented 3 years ago

Oh if I remember right, my boolean arguments might have been inverted and I never bothered correcting it cause I was in a hurry to submit this for my coursework. Do you mind playing around with the --per and --her arguments to see if it works? I apologize for the inconvenience. But I am pretty sure that it converges like shown in the plots.

As for the issue I have this following error: [ Screenshot from 2021-02-07 22-07-08

I've lready added that path in my bashrc but it doesn't seem to be responding to the new changes.

kaixindelele commented 3 years ago

Did you execute in this terminal?

source ~/.bashrc

I'm running the inverted one, but I'm using CPU, and the result hasn't come out yet. Haha, when I firstly see the default parameters in your code, it's very strange. One is that epochs is only 50, and her + per is explicitly written, but why is it set to false?

sush1996 commented 3 years ago

Sure keep me posted. And I did try that, but it's still showing me the same error.

kaixindelele commented 3 years ago

add the following to bashrc?

export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}

the last performance:

[2021-02-08 11:05:18.451123] epoch is: 0, eval success rate is: 0.000 [2021-02-08 11:06:21.069528] epoch is: 1, eval success rate is: 0.000 [2021-02-08 11:07:27.527858] epoch is: 2, eval success rate is: 0.000 [2021-02-08 11:08:32.076572] epoch is: 3, eval success rate is: 0.000 [2021-02-08 11:09:36.614360] epoch is: 4, eval success rate is: 0.000 [2021-02-08 11:10:42.696010] epoch is: 5, eval success rate is: 0.000 [2021-02-08 11:11:46.269051] epoch is: 6, eval success rate is: 0.000 [2021-02-08 11:12:44.954278] epoch is: 7, eval success rate is: 0.000 [2021-02-08 11:13:51.055290] epoch is: 8, eval success rate is: 0.000 [2021-02-08 11:14:57.420452] epoch is: 9, eval success rate is: 0.000 [2021-02-08 11:16:01.408158] epoch is: 10, eval success rate is: 0.000 [2021-02-08 11:17:12.485674] epoch is: 11, eval success rate is: 0.000 [2021-02-08 11:18:18.625677] epoch is: 12, eval success rate is: 0.000 [2021-02-08 11:19:32.340626] epoch is: 13, eval success rate is: 0.000 [2021-02-08 11:20:48.132591] epoch is: 14, eval success rate is: 0.000 [2021-02-08 11:21:54.531380] epoch is: 15, eval success rate is: 0.000 [2021-02-08 11:23:01.774903] epoch is: 16, eval success rate is: 0.000 [2021-02-08 11:24:23.340585] epoch is: 17, eval success rate is: 0.000 [2021-02-08 11:25:38.986217] epoch is: 18, eval success rate is: 0.000 [2021-02-08 11:26:51.091589] epoch is: 19, eval success rate is: 0.000 [2021-02-08 11:28:05.992170] epoch is: 20, eval success rate is: 0.000 [2021-02-08 11:29:27.054367] epoch is: 21, eval success rate is: 0.000

May I run with mpirun -np 19?

sush1996 commented 3 years ago

Screenshot from 2021-02-07 22-46-07

Ok I figured it out. I should not have been using sudo to run the python code. https://github.com/openai/mujoco-py/issues/267

I just tried running mine and I am unable to replicate my posted results. I shall keep you updated, however,

kaixindelele commented 3 years ago

I am also got a bad results~ The performance of default args with 1 process:

[2021-02-08 11:05:18.451123] epoch is: 0, eval success rate is: 0.000 [2021-02-08 11:06:21.069528] epoch is: 1, eval success rate is: 0.000 [2021-02-08 11:07:27.527858] epoch is: 2, eval success rate is: 0.000 [2021-02-08 11:08:32.076572] epoch is: 3, eval success rate is: 0.000 [2021-02-08 11:09:36.614360] epoch is: 4, eval success rate is: 0.000 [2021-02-08 11:10:42.696010] epoch is: 5, eval success rate is: 0.000 [2021-02-08 11:11:46.269051] epoch is: 6, eval success rate is: 0.000 [2021-02-08 11:12:44.954278] epoch is: 7, eval success rate is: 0.000 [2021-02-08 11:13:51.055290] epoch is: 8, eval success rate is: 0.000 [2021-02-08 11:14:57.420452] epoch is: 9, eval success rate is: 0.000 [2021-02-08 11:16:01.408158] epoch is: 10, eval success rate is: 0.000 [2021-02-08 11:17:12.485674] epoch is: 11, eval success rate is: 0.000 [2021-02-08 11:18:18.625677] epoch is: 12, eval success rate is: 0.000 [2021-02-08 11:19:32.340626] epoch is: 13, eval success rate is: 0.000 [2021-02-08 11:20:48.132591] epoch is: 14, eval success rate is: 0.000 [2021-02-08 11:21:54.531380] epoch is: 15, eval success rate is: 0.000 [2021-02-08 11:23:01.774903] epoch is: 16, eval success rate is: 0.000 [2021-02-08 11:24:23.340585] epoch is: 17, eval success rate is: 0.000 [2021-02-08 11:25:38.986217] epoch is: 18, eval success rate is: 0.000 [2021-02-08 11:26:51.091589] epoch is: 19, eval success rate is: 0.000 [2021-02-08 11:28:05.992170] epoch is: 20, eval success rate is: 0.000 [2021-02-08 11:29:27.054367] epoch is: 21, eval success rate is: 0.000 [2021-02-08 11:30:58.771841] epoch is: 22, eval success rate is: 0.000 [2021-02-08 11:32:36.626336] epoch is: 23, eval success rate is: 0.000 [2021-02-08 11:34:11.696801] epoch is: 24, eval success rate is: 0.000 [2021-02-08 11:35:43.158720] epoch is: 25, eval success rate is: 0.000 [2021-02-08 11:37:15.144053] epoch is: 26, eval success rate is: 0.000 [2021-02-08 11:38:47.499482] epoch is: 27, eval success rate is: 0.000 [2021-02-08 11:40:18.300277] epoch is: 28, eval success rate is: 0.000 [2021-02-08 11:41:54.447832] epoch is: 29, eval success rate is: 0.000 [2021-02-08 11:43:30.524437] epoch is: 30, eval success rate is: 0.000 [2021-02-08 11:45:00.464543] epoch is: 31, eval success rate is: 0.000 [2021-02-08 11:46:33.471629] epoch is: 32, eval success rate is: 0.000 [2021-02-08 11:48:09.422737] epoch is: 33, eval success rate is: 0.000 [2021-02-08 11:49:51.138570] epoch is: 34, eval success rate is: 0.000 [2021-02-08 11:51:21.406536] epoch is: 35, eval success rate is: 0.000 [2021-02-08 11:53:05.610771] epoch is: 36, eval success rate is: 0.000 [2021-02-08 11:54:41.652037] epoch is: 37, eval success rate is: 0.000 [2021-02-08 11:56:15.104272] epoch is: 38, eval success rate is: 0.000 [2021-02-08 11:57:57.482927] epoch is: 39, eval success rate is: 0.000 [2021-02-08 11:59:40.078654] epoch is: 40, eval success rate is: 0.000 [2021-02-08 12:01:13.562693] epoch is: 41, eval success rate is: 0.000 [2021-02-08 12:02:50.860085] epoch is: 42, eval success rate is: 0.000 [2021-02-08 12:04:29.506913] epoch is: 43, eval success rate is: 0.000 [2021-02-08 12:06:08.477937] epoch is: 44, eval success rate is: 0.000 [2021-02-08 12:07:47.034193] epoch is: 45, eval success rate is: 0.000 [2021-02-08 12:09:25.985705] epoch is: 46, eval success rate is: 0.000 [2021-02-08 12:11:04.682959] epoch is: 47, eval success rate is: 0.000 [2021-02-08 12:12:42.998701] epoch is: 48, eval success rate is: 0.100 [2021-02-08 12:14:26.842898] epoch is: 49, eval success rate is: 0.000

Have you tested using Pytorch with GPU? What is your environmental dependence?

sush1996 commented 3 years ago

Screenshot from 2021-02-08 02-02-28 Ok it seems to be working now. Can you change --per to True and --her to True as well and change n_cycles to 19 instead of 1? That should do the trick. These are my arguments finally:

Screenshot from 2021-02-08 02-02-39

kaixindelele commented 3 years ago

What is your launch script?

python train.py Or mpirun -np19 train.py? the n_cycles is 19 for default, and I did not change it~~

sush1996 commented 3 years ago

I used the latter. And also maybe change the number of mpi runs to 2, if that doesn't help either, then I am not really sure. I am not using any GPU btw.

kaixindelele commented 3 years ago

Wow... Can you share the script(args txt) and command(mpirun...) of your successful exp. I'll run again according to your parameters~ It's too much trouble for you~

mpirun -np 19 python -m baselines.run --num_env=2 --alg=her

For baselines, this will converge about 10 epochs for FetchPush-v1. And the cycle is 50, the num-rollout-per-mpi is 2.

sush1996 commented 3 years ago

I 've updated my repo with the latest files and the commands that I've used in the READMe section. I'll be closing this issue.