Closed kaixindelele closed 3 years ago
The performance gain when using PER with the DDPG+HER scheme is negligible for FetchReach-v1, but does slightly better on FetchPickandPlace-v1 and does a lot better for FetchSlide-v1. I shall update the the repo with the plots soon
Thank you for your reply. I run your code about 400 epochs for FetchPush-v1, the eval performance is :
[2021-02-08 09:57:31.051204] epoch is: 393, eval success rate is: 0.300 [2021-02-08 10:00:08.442811] epoch is: 394, eval success rate is: 0.500 [2021-02-08 10:02:41.274267] epoch is: 395, eval success rate is: 0.200 [2021-02-08 10:05:16.168315] epoch is: 396, eval success rate is: 0.400 [2021-02-08 10:07:52.791192] epoch is: 397, eval success rate is: 0.100 [2021-02-08 10:10:25.656766] epoch is: 398, eval success rate is: 0.400 [2021-02-08 10:12:56.899515] epoch is: 399, eval success rate is: 0.100 [2021-02-08 10:15:29.850096] epoch is: 400, eval success rate is: 0.100 [2021-02-08 10:17:31.082211] epoch is: 401, eval success rate is: 0.200 [2021-02-08 10:19:18.113382] epoch is: 402, eval success rate is: 0.200 [2021-02-08 10:21:57.569483] epoch is: 403, eval success rate is: 0.200
but for the origin baselines library, for almost fetch envs, 300 epochs can learn 0.6~1.0 success rate with single process, single env.
I render the process, and invest into the re-reward function, At the beginning of study, the agent is can not touch the cube, so the cube position is fixed from beginning to end. No matter how you modify the goal, it thinks that the current position is the best, and it is perfect without touching the cube. This kind of data accounts for 80% of them, and the remaining 20% is the original sparse reward, which makes it impossible for agents to learn good policy.
I I used to think about whether PERcan enlarge the effect of a few of data which contact cube, but now I find that the effect even is not as good as baselines~
Maybe I missed some details?
Oh that's interesting. It's been a while since I ran this program and I am unable to run in on my system now due to some issues with mujoco_py path on bashrc. Once I resolve it, I can try simulating it and I can get back to you - I'll get on this soon. I've put up snapshots of the plots, you can see that for FetchPush-v1, you can converge in just about 50 epochs.
I've uploaded the final models for each environment, I honestly don't remember what their performance was like. But I've uploaded it anyway, in case you find it helpful.
Some bugs for mujoco_py?Maybe I can help you~ This is my args:
def get_args(): parser = argparse.ArgumentParser()
the environment setting
parser.add_argument('--env-name', type=str, default='FetchPush-v1', help='the environment name') parser.add_argument('--n-epochs', type=int, default=2000, help='the number of epochs to train the agent') parser.add_argument('--n-cycles', type=int, default=19, help='the times to collect samples per epoch') parser.add_argument('--n-batches', type=int, default=40, help='the times to update the network') parser.add_argument('--save-interval', type=int, default=5, help='the interval that save the trajectory') parser.add_argument('--seed', type=int, default=123, help='random seed') parser.add_argument('--num-workers', type=int, default=1, help='the number of cpus to collect samples') parser.add_argument('--replay-strategy', type=str, default='future', help='the HER strategy') parser.add_argument('--clip-return', type=float, default=50, help='if clip the returns') parser.add_argument('--save-dir', type=str, default='saved_models/', help='the path to save the models') parser.add_argument('--noise-eps', type=float, default=0.2, help='noise eps') parser.add_argument('--random-eps', type=float, default=0.3, help='random eps') parser.add_argument('--buffer-size', type=int, default=int(1e6), help='the size of the buffer') parser.add_argument('--replay-k', type=int, default=4, help='ratio to be replace') parser.add_argument('--clip-obs', type=float, default=200, help='the clip ratio') parser.add_argument('--batch-size', type=int, default=256, help='the sample batch size') parser.add_argument('--gamma', type=float, default=0.98, help='the discount factor') parser.add_argument('--action-l2', type=float, default=1, help='l2 reg') parser.add_argument('--lr-actor', type=float, default=0.001, help='the learning rate of the actor') parser.add_argument('--lr-critic', type=float, default=0.001, help='the learning rate of the critic') parser.add_argument('--polyak', type=float, default=0.95, help='the average coefficient') parser.add_argument('--n-test-rollouts', type=int, default=10, help='the number of tests') parser.add_argument('--clip-range', type=float, default=5, help='the clip range') parser.add_argument('--demo-length', type=int, default=20, help='the demo length') parser.add_argument('--cuda', type=bool, default=False, help='if use gpu do the acceleration') parser.add_argument('--num-rollouts-per-mpi', type=int, default=2, help='the rollouts per mpi') parser.add_argument('--her', type=bool, default=True, help='is HER True or False') parser.add_argument('--per', type=bool, default=True, help='is PER True or False')
If the agent can converge in just about 50 epochs, that is really perfect!
Oh if I remember right, my boolean arguments might have been inverted and I never bothered correcting it cause I was in a hurry to submit this for my coursework. Do you mind playing around with the --per and --her arguments to see if it works? I apologize for the inconvenience. But I am pretty sure that it converges like shown in the plots.
As for the issue I have this following error: [
I've lready added that path in my bashrc but it doesn't seem to be responding to the new changes.
Did you execute in this terminal?
source ~/.bashrc
I'm running the inverted one, but I'm using CPU, and the result hasn't come out yet. Haha, when I firstly see the default parameters in your code, it's very strange. One is that epochs is only 50, and her + per is explicitly written, but why is it set to false?
Sure keep me posted. And I did try that, but it's still showing me the same error.
add the following to bashrc?
export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}
the last performance:
[2021-02-08 11:05:18.451123] epoch is: 0, eval success rate is: 0.000 [2021-02-08 11:06:21.069528] epoch is: 1, eval success rate is: 0.000 [2021-02-08 11:07:27.527858] epoch is: 2, eval success rate is: 0.000 [2021-02-08 11:08:32.076572] epoch is: 3, eval success rate is: 0.000 [2021-02-08 11:09:36.614360] epoch is: 4, eval success rate is: 0.000 [2021-02-08 11:10:42.696010] epoch is: 5, eval success rate is: 0.000 [2021-02-08 11:11:46.269051] epoch is: 6, eval success rate is: 0.000 [2021-02-08 11:12:44.954278] epoch is: 7, eval success rate is: 0.000 [2021-02-08 11:13:51.055290] epoch is: 8, eval success rate is: 0.000 [2021-02-08 11:14:57.420452] epoch is: 9, eval success rate is: 0.000 [2021-02-08 11:16:01.408158] epoch is: 10, eval success rate is: 0.000 [2021-02-08 11:17:12.485674] epoch is: 11, eval success rate is: 0.000 [2021-02-08 11:18:18.625677] epoch is: 12, eval success rate is: 0.000 [2021-02-08 11:19:32.340626] epoch is: 13, eval success rate is: 0.000 [2021-02-08 11:20:48.132591] epoch is: 14, eval success rate is: 0.000 [2021-02-08 11:21:54.531380] epoch is: 15, eval success rate is: 0.000 [2021-02-08 11:23:01.774903] epoch is: 16, eval success rate is: 0.000 [2021-02-08 11:24:23.340585] epoch is: 17, eval success rate is: 0.000 [2021-02-08 11:25:38.986217] epoch is: 18, eval success rate is: 0.000 [2021-02-08 11:26:51.091589] epoch is: 19, eval success rate is: 0.000 [2021-02-08 11:28:05.992170] epoch is: 20, eval success rate is: 0.000 [2021-02-08 11:29:27.054367] epoch is: 21, eval success rate is: 0.000
May I run with mpirun -np 19?
Ok I figured it out. I should not have been using sudo to run the python code. https://github.com/openai/mujoco-py/issues/267
I just tried running mine and I am unable to replicate my posted results. I shall keep you updated, however,
I am also got a bad results~ The performance of default args with 1 process:
[2021-02-08 11:05:18.451123] epoch is: 0, eval success rate is: 0.000 [2021-02-08 11:06:21.069528] epoch is: 1, eval success rate is: 0.000 [2021-02-08 11:07:27.527858] epoch is: 2, eval success rate is: 0.000 [2021-02-08 11:08:32.076572] epoch is: 3, eval success rate is: 0.000 [2021-02-08 11:09:36.614360] epoch is: 4, eval success rate is: 0.000 [2021-02-08 11:10:42.696010] epoch is: 5, eval success rate is: 0.000 [2021-02-08 11:11:46.269051] epoch is: 6, eval success rate is: 0.000 [2021-02-08 11:12:44.954278] epoch is: 7, eval success rate is: 0.000 [2021-02-08 11:13:51.055290] epoch is: 8, eval success rate is: 0.000 [2021-02-08 11:14:57.420452] epoch is: 9, eval success rate is: 0.000 [2021-02-08 11:16:01.408158] epoch is: 10, eval success rate is: 0.000 [2021-02-08 11:17:12.485674] epoch is: 11, eval success rate is: 0.000 [2021-02-08 11:18:18.625677] epoch is: 12, eval success rate is: 0.000 [2021-02-08 11:19:32.340626] epoch is: 13, eval success rate is: 0.000 [2021-02-08 11:20:48.132591] epoch is: 14, eval success rate is: 0.000 [2021-02-08 11:21:54.531380] epoch is: 15, eval success rate is: 0.000 [2021-02-08 11:23:01.774903] epoch is: 16, eval success rate is: 0.000 [2021-02-08 11:24:23.340585] epoch is: 17, eval success rate is: 0.000 [2021-02-08 11:25:38.986217] epoch is: 18, eval success rate is: 0.000 [2021-02-08 11:26:51.091589] epoch is: 19, eval success rate is: 0.000 [2021-02-08 11:28:05.992170] epoch is: 20, eval success rate is: 0.000 [2021-02-08 11:29:27.054367] epoch is: 21, eval success rate is: 0.000 [2021-02-08 11:30:58.771841] epoch is: 22, eval success rate is: 0.000 [2021-02-08 11:32:36.626336] epoch is: 23, eval success rate is: 0.000 [2021-02-08 11:34:11.696801] epoch is: 24, eval success rate is: 0.000 [2021-02-08 11:35:43.158720] epoch is: 25, eval success rate is: 0.000 [2021-02-08 11:37:15.144053] epoch is: 26, eval success rate is: 0.000 [2021-02-08 11:38:47.499482] epoch is: 27, eval success rate is: 0.000 [2021-02-08 11:40:18.300277] epoch is: 28, eval success rate is: 0.000 [2021-02-08 11:41:54.447832] epoch is: 29, eval success rate is: 0.000 [2021-02-08 11:43:30.524437] epoch is: 30, eval success rate is: 0.000 [2021-02-08 11:45:00.464543] epoch is: 31, eval success rate is: 0.000 [2021-02-08 11:46:33.471629] epoch is: 32, eval success rate is: 0.000 [2021-02-08 11:48:09.422737] epoch is: 33, eval success rate is: 0.000 [2021-02-08 11:49:51.138570] epoch is: 34, eval success rate is: 0.000 [2021-02-08 11:51:21.406536] epoch is: 35, eval success rate is: 0.000 [2021-02-08 11:53:05.610771] epoch is: 36, eval success rate is: 0.000 [2021-02-08 11:54:41.652037] epoch is: 37, eval success rate is: 0.000 [2021-02-08 11:56:15.104272] epoch is: 38, eval success rate is: 0.000 [2021-02-08 11:57:57.482927] epoch is: 39, eval success rate is: 0.000 [2021-02-08 11:59:40.078654] epoch is: 40, eval success rate is: 0.000 [2021-02-08 12:01:13.562693] epoch is: 41, eval success rate is: 0.000 [2021-02-08 12:02:50.860085] epoch is: 42, eval success rate is: 0.000 [2021-02-08 12:04:29.506913] epoch is: 43, eval success rate is: 0.000 [2021-02-08 12:06:08.477937] epoch is: 44, eval success rate is: 0.000 [2021-02-08 12:07:47.034193] epoch is: 45, eval success rate is: 0.000 [2021-02-08 12:09:25.985705] epoch is: 46, eval success rate is: 0.000 [2021-02-08 12:11:04.682959] epoch is: 47, eval success rate is: 0.000 [2021-02-08 12:12:42.998701] epoch is: 48, eval success rate is: 0.100 [2021-02-08 12:14:26.842898] epoch is: 49, eval success rate is: 0.000
Have you tested using Pytorch with GPU? What is your environmental dependence?
Ok it seems to be working now. Can you change --per to True and --her to True as well and change n_cycles to 19 instead of 1? That should do the trick. These are my arguments finally:
What is your launch script?
python train.py Or mpirun -np19 train.py? the n_cycles is 19 for default, and I did not change it~~
I used the latter. And also maybe change the number of mpi runs to 2, if that doesn't help either, then I am not really sure. I am not using any GPU btw.
Wow... Can you share the script(args txt) and command(mpirun...) of your successful exp. I'll run again according to your parameters~ It's too much trouble for you~
mpirun -np 19 python -m baselines.run --num_env=2 --alg=her
For baselines, this will converge about 10 epochs for FetchPush-v1. And the cycle is 50, the num-rollout-per-mpi is 2.
I 've updated my repo with the latest files and the commands that I've used in the READMe section. I'll be closing this issue.
Hi, I also want to combine PER+HER with DDPG, because in Fetch Envs, agent cannot reach the cube for earlier exploration, it is really waste time. So, does the PER can promote the performance for the DDPG ?