If we frame our approach as doing "Uncertainty-aware Planning with Learned dynamics", we can broadly classify different methods
Choice of gradient estimation: First vs. Zeroth-order.
first order methods have less variance, as zeroth-order gradients suffer from variance-dependence.
zeroth-order potentially has smoothing effects and is robust against exploding gradients.
Choice of transcription: Single shooting vs. Direct collocation.
single-shooting with learned dynamics suffers from compounding error of autoregressive rollouts, unless dynamics is directly trained with simulation error.
single-shooting also requires differentiation through a long trajectory, which might suffer from gradient explosion.
direct collocation potentially overcomes some of these limitations, but is often expensive to implement.
Choice of uncertainty measure: Ensembles vs. GPs vs. DataDistance.
ensembles underestimate uncertainty
ensembles are compute-intensive to train
ensembles have spurious local minima in the uncertainty landscape are not friendly to gradient-based optimization.
To convincingly show the benefits of our method (first + dircol + datadistance) as opposed to popular planning approaches like MPPI with ensemble variance (zeroth + shooting + ensemble), where different options are summarized as follows:
If we frame our approach as doing "Uncertainty-aware Planning with Learned dynamics", we can broadly classify different methods
To convincingly show the benefits of our method (first + dircol + datadistance) as opposed to popular planning approaches like MPPI with ensemble variance (zeroth + shooting + ensemble), where different options are summarized as follows: