Diffusion Planning Comparison

hjsuh94 commented 1 year ago

If we frame our approach as doing "Uncertainty-aware Planning with Learned dynamics", we can broadly classify different methods

Choice of gradient estimation: First vs. Zeroth-order.
- first order methods have less variance, as zeroth-order gradients suffer from variance-dependence.
- zeroth-order potentially has smoothing effects and is robust against exploding gradients.
Choice of transcription: Single shooting vs. Direct collocation.
- single-shooting with learned dynamics suffers from compounding error of autoregressive rollouts, unless dynamics is directly trained with simulation error.
- single-shooting also requires differentiation through a long trajectory, which might suffer from gradient explosion.
- direct collocation potentially overcomes some of these limitations, but is often expensive to implement.
Choice of uncertainty measure: Ensembles vs. GPs vs. DataDistance.
- ensembles underestimate uncertainty
- ensembles are compute-intensive to train
- ensembles have spurious local minima in the uncertainty landscape are not friendly to gradient-based optimization.

To convincingly show the benefits of our method (first + dircol + datadistance) as opposed to popular planning approaches like MPPI with ensemble variance (zeroth + shooting + ensemble), where different options are summarized as follows:

		Single Shooting	Direct Collocation
First-order	Ensembles
	DataDistance	DRisk Trajopt	Diffusion Planning
Zeroth-order	Ensembles	MPPI w/ Ensembles
	DataDistance

hjsuh94 commented 1 year ago

List of experiments that seem critical for the paper:

When does zeroth-order not work?
- set up a high-dimensional action-space example where due to variance, MPPI does not do too well.
When does shooting not work? (we've already seen that it doesn't work even with distribution risk!)
- set up examples for shooting where dynamics is not accurate due to autoregressive rollouts
- set up examples for shooting where due to long horizon, the gradients explode and are unstable.
When does ensembles not work?
- set up examples where ensembles underestimate uncertainty
- set up examples where ensembles have spurious local minima.

hjsuh94 commented 1 year ago

Very minimum set of examples

Single Integrator with Obstacles
- Show that ensembles underestimate uncertainty
- Show that ensembles do not stabilize to data
- Show that sampling-based approaches like MPPI does not do well in very high-dimensional single-integrator settings.
Pendulum / Cart-pole / Acrobot
- Show that shooting predictions are not very accurate over long horizons which lead to failure
- Show that shooting gradients blow up over long horizons (T>200?)

hjsuh94 / score_po

Diffusion Planning Comparison #48