hjsuh94 / score_po

Score-Guided Planning
9 stars 0 forks source link

Diffusion Planning Comparison #48

Open hjsuh94 opened 1 year ago

hjsuh94 commented 1 year ago

If we frame our approach as doing "Uncertainty-aware Planning with Learned dynamics", we can broadly classify different methods

  1. Choice of gradient estimation: First vs. Zeroth-order.
    • first order methods have less variance, as zeroth-order gradients suffer from variance-dependence.
    • zeroth-order potentially has smoothing effects and is robust against exploding gradients.
  2. Choice of transcription: Single shooting vs. Direct collocation.
    • single-shooting with learned dynamics suffers from compounding error of autoregressive rollouts, unless dynamics is directly trained with simulation error.
    • single-shooting also requires differentiation through a long trajectory, which might suffer from gradient explosion.
    • direct collocation potentially overcomes some of these limitations, but is often expensive to implement.
  3. Choice of uncertainty measure: Ensembles vs. GPs vs. DataDistance.
    • ensembles underestimate uncertainty
    • ensembles are compute-intensive to train
    • ensembles have spurious local minima in the uncertainty landscape are not friendly to gradient-based optimization.

To convincingly show the benefits of our method (first + dircol + datadistance) as opposed to popular planning approaches like MPPI with ensemble variance (zeroth + shooting + ensemble), where different options are summarized as follows:

Single Shooting Direct Collocation
First-order Ensembles
DataDistance DRisk Trajopt Diffusion Planning
Zeroth-order Ensembles MPPI w/ Ensembles
DataDistance
hjsuh94 commented 1 year ago

List of experiments that seem critical for the paper:

  1. When does zeroth-order not work?
    • set up a high-dimensional action-space example where due to variance, MPPI does not do too well.
  2. When does shooting not work? (we've already seen that it doesn't work even with distribution risk!)
    • set up examples for shooting where dynamics is not accurate due to autoregressive rollouts
    • set up examples for shooting where due to long horizon, the gradients explode and are unstable.
  3. When does ensembles not work?
    • set up examples where ensembles underestimate uncertainty
    • set up examples where ensembles have spurious local minima.
hjsuh94 commented 1 year ago

Very minimum set of examples

  1. Single Integrator with Obstacles

    • Show that ensembles underestimate uncertainty
    • Show that ensembles do not stabilize to data
    • Show that sampling-based approaches like MPPI does not do well in very high-dimensional single-integrator settings.
  2. Pendulum / Cart-pole / Acrobot

    • Show that shooting predictions are not very accurate over long horizons which lead to failure
    • Show that shooting gradients blow up over long horizons (T>200?)