rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.49k stars 553 forks source link

Cannot learn (pusher experiment) #31

Open Nicolas99-9 opened 5 years ago

Nicolas99-9 commented 5 years ago

I ran the experiment with RIG + pusher with the original settings. Contrary to the paper, I cannot observe any improvement of the average return or success rate. How can I reproduce the original paper results?

Output after 100 epochs (388004 iterations)

hand_distance Mean 0.0369142 hand_distance Std 0.0125764 hand_distance Max 0.153462 hand_distance Min 0.0280608 Final hand_distance Mean 0.0406867 Final hand_distance Std 0.00171668 Final hand_distance Max 0.0432108 Final hand_distance Min 0.0369941 puck_distance Mean 0.14895 puck_distance Std 0.0542304 puck_distance Max 0.234878 puck_distance Min 0.0381613 Final puck_distance Mean 0.150567 Final puck_distance Std 0.0544439 Final puck_distance Max 0.234878 Final puck_distance Min 0.0528324 touch_distance Mean 0.0677295 touch_distance Std 0.00900345 touch_distance Max 0.105978 touch_distance Min 0.0565952 Final touch_distance Mean 0.0690568 Final touch_distance Std 0.00635284 Final touch_distance Max 0.0836148 Final touch_distance Min 0.0630353 success Mean 0 success Std 0 success Max 0 success Min 0 Final success Mean 0 Final success Std 0 Final success Max 0 Final success Min 0 QF1 Loss 0.171023 QF2 Loss 0.18778 Policy Loss 104.349 Q1 Predictions Mean -104.665 Q1 Predictions Std 88.4167 Q1 Predictions Max -4.22522 Q1 Predictions Min -336.314 Q2 Predictions Mean -104.558 Q2 Predictions Std 88.3845 Q2 Predictions Max -4.09563 Q2 Predictions Min -335.829 Q Targets Mean -104.699 Q Targets Std 88.4681 Q Targets Max -4.3236 Q Targets Min -336.498 Bellman Errors 1 Mean 0.171023 Bellman Errors 1 Std 0.416491 Bellman Errors 1 Max 3.98678 Bellman Errors 1 Min 1.80304e-06 Bellman Errors 2 Mean 0.18778 Bellman Errors 2 Std 0.355425 Bellman Errors 2 Max 2.20607 Bellman Errors 2 Min 3.32488e-06 Policy Action Mean 0.194368 Policy Action Std 0.731224 Policy Action Max 1 Policy Action Min -1 Test Rewards Mean -0.360232 Test Rewards Std 0.594497 Test Rewards Max -0.00478052 Test Rewards Min -5.77281 Test Returns Mean -36.0232 Test Returns Std 22.2522 Test Returns Max -13.6433 Test Returns Min -92.884 Test Actions Mean 0.0276301 Test Actions Std 0.247718 Test Actions Max 1 Test Actions Min -0.838328 Num Paths 10 Exploration Rewards Mean -1.33633 Exploration Rewards Std 0.762848 Exploration Rewards Max -0.225774 Exploration Rewards Min -5.57678 Exploration Returns Mean -133.633 Exploration Returns Std 60.5544 Exploration Returns Max -42.4458 Exploration Returns Min -214.126 Exploration Actions Mean 0.262311 Exploration Actions Std 0.44695 Exploration Actions Max 1 Exploration Actions Min -1 image_dist Mean 10.9785 image_dist Std 1.05965 image_dist Max 14.3582 image_dist Min 9.29165 Final image_dist Mean 10.8799 Final image_dist Std 1.03786 Final image_dist Max 12.6932 Final image_dist Min 9.39808 image_success Mean -0.727273 image_success Std 0.445362 image_success Max 0 image_success Min -1 Final image_success Mean -0.727273 Final image_success Std 0.445362 Final image_success Max 0 Final image_success Min -1 vae_dist Mean 0.360232 vae_dist Std 0.594497 vae_dist Max 5.77281 vae_dist Min 0.00478052 Final vae_dist Mean 0.216473 Final vae_dist Std 0.096578 Final vae_dist Max 0.361867 Final vae_dist Min 0.0465295 AverageReturn -36.0232 Number of train steps total 388004 Number of env steps total 101000 Number of rollouts total 1010 Train Time (s) 323.224 (Previous) Eval Time (s) 123.294 Sample Time (s) 115.072 Epoch Time (s) 561.59 Total Train Time (s) 29454.1 Epoch 100

Evolution of the average return:

screenshot from 2019-01-24 18-11-49

vitchyr commented 5 years ago

Did you modify the example script? The fact that the Average Returns start at -40 seems odd. Also, I wouldn't actually look at "Average return" too much. Distances in the latent space can be hard to interpret. Here's an example of what a run should look like, which also plots the most intuitive metrics (final hand/puck distance):

image

I made this plot using my version of viskit. I'll run more seeds now with the latest code, but this should work consistently.

vitchyr commented 5 years ago

Okay, I ran more seeds and it does seem to have high variance in the performance, as shown below. image

If it weren't for the green and purple curves, it'd basically be the same as in the paper. I'll update here if I find out why it's such higher variance, but I think this confirms that the code is mostly working.

tunglm2203 commented 4 years ago

Hi @vitchyr, I am trying to run RIG algorithm with Pusher environment, I also faced the same above problem. I try to run 5 different seeds, the AverageReturn and Final hand_distance Mean seem to the same scale with yours, but the Final puck_distance Mean is different, it's similar to your green and purple curves. push_rig

There 3 experiments (red, green, purple) I am still running. How can reproduce the same results in paper?

vitchyr commented 4 years ago

The RIG implementation currently uses the "online VAE training." However, the main experiments in the RIG paper use a pre-trained VAE.

The settings can be found on this branch, and should produce results more similar to the RIG paper: https://github.com/vitchyr/rlkit/tree/v0.1.2

shivanikishnani commented 4 years ago

Hi! Are the results you posted https://github.com/vitchyr/rlkit/issues/31#issuecomment-462994015 based on the Online VAE or are they using a pre-trained VAE?

I've been trying to train RIG's Pusher using version 0.1.2, but my results don't seem anything like the ones in the paper. I tried it the result with multiple seeds, as well. The parameters are the same as in example/rig/pusher/rig.py. An example of one of the results is below. If the issue is just that there is a lot of variance while using different seeds, do you know why that may be? Could you tell me what seeds you were using?

image oracle.py works fine, by the way.

RIG's Reacher ('SawyerReachXYZEnv-v1' and also 'SawyerReachXYZEnv-v0) also don't seem to be working as indicated in the paper and have high variance. For reacher, I'm training the VAE on 100 images, as indicated in the paper, and am training it for 100 epochs. I'm running the entire algorithm 100 epochs as well. The other parameters are the same as for Pusher.

Screenshot 2020-04-27 at 07 37 41

I'd really appreciate it if you could let me know if something is wrong. Thanks!

vitchyr commented 4 years ago

@shivanikishnani @tunglm2203 What version of python, pytorch, mujoco, and multiworld are you using? I have a slight suspicion that that may be related to the difference in performance. I'm also unsure what else could be the cause.

I'm using the following:

shivanikishnani commented 4 years ago

@vitchyr I tested it with your linux-gpu-env.yml's environment as well, which had the same package versions. I was using the most recent version of multiworld, but have now installed the package at that commit.

I had to install the some libraries that were missing from your environment specification that are needed to run the experiments- including torchvision.* My torchvision install also installed a different version of pytorch (1.3.0), which is what was being used later. What version of torchvision are you using and how did you install it without affecting your pytorch installation? It's being used in vae_trainer.py.

vitchyr commented 4 years ago

I was using torchvision version 0.2.0. Can you let me know if the old version of multiworld males a difference? If so, a PR would be great

On Tue, Apr 28, 2020, 8:23 PM Shivani Kishnani notifications@github.com wrote:

@vitchyr https://github.com/vitchyr I tested it with your linux-gpu-env.yml's environment as well, which had the same package versions. I was using the most recent version of multiworld, but have now installed the package at that commit.

I had to install the some libraries that were missing from your environment specification that are needed to run the experiments- including torchvision.* My torchvision install also installed a different version of pytorch (1.3.0), which is what was being used later. What version of torchvision are you using and how did you install it without affecting your pytorch installation? It's being used in vae_trainer.py.

  • Let me know if you want me to create a pull request with an updated yml file

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vitchyr/rlkit/issues/31#issuecomment-620969616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4VZO4EMR7SSEKDEVREE3RO6MTXANCNFSM4GSBZL6A .

shivanikishnani commented 4 years ago

It was either the old version of multiworld or pytorch which made a difference, but the experiments seem to be learning now.

However, they seem to have a much higher variance than the ones you posted above. I'm running it with multiple seeds to see if those make a difference. The final hand distance of the pusher is given below, for some reason, it's not plotting along with reacher. I'm smoothing out the curves.

image

image

vitchyr commented 4 years ago

@shivanikishnani Did running multiple seeds make a difference?