personalrobotics / CCIL

Code release and project site for "CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning"
https://personalrobotics.github.io/CCIL/
12 stars 4 forks source link

Unable to reproduce the PendulumSwingup results #3

Open dadadadawjb opened 2 months ago

dadadadawjb commented 2 months ago

Hi team,

Thanks for sharing the great work! I have tried reproducing the PendulumSwingup experiments, both continuous and discontinuous. I just used the scripts and codes you gave, without any modification. But I find the results do not match the performance shown in Figure 3. (c) in paper CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning. Do any hyperparameters need to tune, or anything else I need to change to get that results?

pendulum_results

Thanks a lot!

abhaybd commented 2 months ago

Thanks for your interest in our work! The exact numbers vary depending on the seeds tested, but CCIL should almost always outperform BC on the pendulum task. Please ensure you're running with the hyperparameters specified in the corresponding .yml files.

Running the following command on my machine:

./scripts/train_ccil.sh "pendulum_cont pendulum_disc" "40 41 42 43 44 45 46 47 48 49" 0.0001

yields the following results:

+-------------------------------------------------------+-----------+---------------+-----------+
| Task                                                  |     Score |   Score (std) |   # seeds |
+=======================================================+===========+===============+===========+
| PendulumSwingupCont-v0_naive                          | -3335.941 |       132.394 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+
| PendulumSwingupCont-v0_noisy_action_soft_samplingL2.0 | -2527.913 |       363.101 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+
| PendulumSwingupDisc-v0_noisy_action_slackL2.0         | -2794.500 |       400.131 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+
| PendulumSwingupDisc-v0_naive                          | -3001.869 |       231.012 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+

As you can see, the exact numbers change due to the associated variance, but CCIL still outperforms standard BC.

dadadadawjb commented 2 months ago

Thanks for your prompt reply! I got it, but as I observed, especially in the discontinuous Pendulum, the performance between CCIL (-2912.906) and naive BC (-2978.408) is actually hard to distinguish on my machine, even with "40 41 42 43 44 45 46 47 48 49" 10 random seeds. Any good suggestions?

Kelym commented 1 month ago

Thanks for bringing it into our attention - it seems there are more variance than we initially realized on PendulumDiscontinuous. (We validated our config on 10 random seeds and 2 computing machines.) We might be able to try tweaking and updating the params, if we can reproduce the experiments that don't have the performance gap and then try sweep parameters from there.

In the meantime, do you have a chance to verify the performance on the other task suite? Just want to double check if this is just a problem with stochasticity in PendulumDiscontinuous or there is more.