Open hjsuh94 opened 1 year ago
Hongkai's corridor? perhaps even with single-integrator dynamics
I like the idea of using single (or maybe double) integrator dynamics in this case. It certainly makes the trajectory optimization a lot easier. One challenge I observe when working with the cart-pole dynamics is that because the dynamics is highly nonlinear, the shooting method doesn't work well when I add the score matching cost. But I expect the trajectory optimization to work much easier for the linear system.
As you mentioned in #40, we have several "extrapolative regime" (car in the corridor, box pushing, etc), I have been thinking hard on the "interpolative regime". One idea is that there are some stepping stones for the half cheetah, and we have collected data for half cheetah walking on these stepping stones, but in-between the stepping stones, the ground has completely different material (like sand or mud), so if we interpolate between the stepping stones where we collect the data, this interpolated dynamics is bad. Therefore we want to use the score function to force the half-cheetah to step on the stepping stones. WDYT?
Overall, we should try to focus our efforts towards what's necessary for the paper.
DataDistance vs. ScoreMatching - do we also want to show that optimal control w/ data distance penalty is empirically equivalent to modifying gradients with score matching?
NoiseConditionedEstimation - do we want to anneal variances or train with a single variance?
What's the minimum set of questions that we want to answer for the experiment? Goal: let's not try to do something too impressive or overly complicated, but have a crisp set of empirical experiments that prove the point.
3.1. Illustrative (and very analyzable) low-dimensional example where distribution risk is helpful. (Hongkai)
3.2. Comparison against existing methods on complex examples (Terry)
3.3 Scalability of our method to pixel-based control problems (Glen / Lu)