Paper | Review | Experiment video | 5min presentation at CoRL 2020
This repository includes codes for synthetic trainings of these robotic tasks in the paper:
Although the codes for all examples are included here, only the pushing example can be run without any additional codes/resources. The other two examples require data from online object dataset and object post-processing, which can take significant amount of time to set up and involves licensing. Meanwhile, all objects (rectangular boxes) used for the pushing example can be generated through URDF files (generativeBox.py
).
Moreover, we provide the pre-trained weights for the decoder network of the cVAE for the pushing example. The posterior policy distribution can be trained then using the weights and the prior distribution (unit Gaussians).
pip install
with python=3.7):..._bc.py
is for behavioral cloning training using collected demonstrations. ..._es.py
is for PAC-Bayes ``fine-tuning'' using Natural Evolutionary Strategies. Also computes the final bound at the end of training...._bound.py
is for computing the final bound.python generateBox.py --obj_folder=...
and specifying the path to the object URDF files generated.obj_folder
in push_pac_easy.json
and push_pac_hard.json
python trainPush_es.py push_pac_easy
(or hard
). The final bound is also computed by specifying L
(number of policies sampled for each environment for computing the sample convergence bound) in the json file. (Note: the default number of training environments is 1000 as in the json files. With num_cpus=20
on a moderately powerful desktop, it takes 20 minutes for each training step. We recommend training using Amazon AWS instance c5.24xlarge that has 96 threads. Also, a useful final bound requires large L
and can take significant computations.)python testPushRollout.py --obj_folder=... --posterior_path=...
. If posterior_path
is not provided, the prior policy distribution (unit Gaussians) is used. Otherwise, the path should be push_result/push_pac_easy/train_details
(or hard
).(Note: we do not plan to release instructions to replicate results of the indoor navigation example in the near future. We plan to refine the simulation in a future version of the paper.)