Topics for Meeting 04/24

Overall, we should try to focus our efforts towards what's necessary for the paper.

DataDistance vs. ScoreMatching - do we also want to show that optimal control w/ data distance penalty is empirically equivalent to modifying gradients with score matching?
NoiseConditionedEstimation - do we want to anneal variances or train with a single variance?
What's the minimum set of questions that we want to answer for the experiment? Goal: let's not try to do something too impressive or overly complicated, but have a crisp set of empirical experiments that prove the point.
- 3.1. Illustrative (and very analyzable) low-dimensional example where distribution risk is helpful. (Hongkai)
  - We want to show that optimizing without distribution risk leads trajectories to go out of the data distribution.
  - We want to show that outside of this data distribution, the dynamics is inaccurate, forcing the planner to not do well.
  - We want to show that with distribution risk, the trajectories stay inside the data distribution, leading to better performance.
  - [Optional] using data distance leads to similar performance with score-matching.
  - The right experiment: Hongkai's corridor? perhaps even with single-integrator dynamics?
- 3.2. Comparison against existing methods on complex examples (Terry)
  - We want to show that compared to existing approaches (MOPO / CQL), our method achieves comparable (or better) performance on existing benchmarks. (For MOPO, comparable is acceptable since Ensembles take long to train)
  - The right experiment. D4RL Mujoco /Adroit Tasks
- 3.3 Scalability of our method to pixel-based control problems (Glen / Lu)
  - We want to leverage strengths of scalability of denoising score-matching to show that we achieve good performance on pixel-based problems.
  - The right experiment.. TBD

Hongkai's corridor? perhaps even with single-integrator dynamics

I like the idea of using single (or maybe double) integrator dynamics in this case. It certainly makes the trajectory optimization a lot easier. One challenge I observe when working with the cart-pole dynamics is that because the dynamics is highly nonlinear, the shooting method doesn't work well when I add the score matching cost. But I expect the trajectory optimization to work much easier for the linear system.

As you mentioned in #40, we have several "extrapolative regime" (car in the corridor, box pushing, etc), I have been thinking hard on the "interpolative regime". One idea is that there are some stepping stones for the half cheetah, and we have collected data for half cheetah walking on these stepping stones, but in-between the stepping stones, the ground has completely different material (like sand or mud), so if we interpolate between the stepping stones where we collect the data, this interpolated dynamics is bad. Therefore we want to use the score function to force the half-cheetah to step on the stepping stones. WDYT?

hjsuh94 / score_po

Topics for Meeting 04/24 #43