ryanhoque / thriftydagger

Code for ThriftyDAgger
11 stars 3 forks source link

Can't reproduce full-autonomy peg insertion results #1

Closed madelineliao closed 2 years ago

madelineliao commented 2 years ago

Hello! I trained Thrifty DAgger on the peg insertion task with robosuite-30.pkl as the initial data and ran 5 epochs of training with interventions (so 10k episodes). I didn't change any hyperparameters. After training, I evaluated the learned model (using the test_agent() function) and ran 100 rollouts, but I can't seem to reproduce the "Auto Succ." success rate in Table 1 of the paper. I get a success rate of 0.29.

I tried this both with human interventions (I provided the intervention data) and the algorithmic supervisor interventions, with the same result. Might I be doing something wrong?

These are the commands that I ran:

  1. I moved run_thriftydagger.py out of /scripts into the main directory (it seemed the module structure of the repo expected that it be run from outside of /scripts
  2. I ran python run_thriftydagger.py with the default arguments and hyperparameters (using robosuite-30.pkl as the initial training data)
  3. I edited the function thrifty() in thrifty/thriftydagger.py to have a default argument of num_test_episodes=100, which is the number of autonomous-only rollouts tested in Table 1 of the paper. I then ran python run_thriftydagger.py --eval <saved_model_path_here> --iters 0 to run evaluation-only.
  4. These steps resulted in a success rate of 0.29.
leader1313 commented 2 years ago

I believe I have also tried an experiment with the same parameters as described in the paper, but I cannot reproduce it.

ryanhoque commented 2 years ago

Thanks for bringing this up. The tricky thing about imitation learning is the performance will be sensitive to the human demonstrations. Since each person will demonstrate the task in subtly different ways, we can't expect the performance to be the same across demonstrators. You could try training on the dataset I collected when I ran the experiments available here or use the model I had trained available here. Regarding the algorithmic supervisor, it's a beta-version that wasn't tested in the paper, so it's not expected to give strong results.