hjsuh94 / irs_mpc

iRS-LQR: iterative Randomized Smoothing LQR
MIT License
7 stars 0 forks source link

High-level Goals & Storytelling #10

Open hjsuh94 opened 2 years ago

hjsuh94 commented 2 years ago

Summary of Meeting

  1. If we want to implement better threading (without multiple processes), Drake's MonteCarlo code that Calder implemented should be informative.
  2. We need to compare irs_lqr on quasistatic model vs. second-order model, in order to decouple the effectiveness of irs_lqr and the effectiveness of doing trajopt on a quasistatic model. We believe that the main advantages of doing a quasistatic model are two-fold:
    • Quasistatic sim allows taking longer timesteps stably. (Alejandro's Todorov relaxation might be an interesting contender to this)
    • Quasistatic sim allows more stable gradients by defining dynamics over longer horizons. Seems like we can roughly come up with the following table:
Sim / Trajopt Exact Smoothing (First Order) Smoothing (Zero Order)
2nd Order
Quasistatic
  1. Hongkai's work on hydroelastic vs. point contact might be relevant here. Would be nice to talk to Hongkai.
  2. Gradient vs. KKT condition. Is the gradient actually large (i.e. we're up against true non-differentiability) or are we just computing it the bad way?
    • If the solver has a set precision, this might interfere with the noise of the inversion (?)
  3. Sphere geometry helps with decoupling hydroelastic with point contact, since the instabilities of. There are basically three sources of discontinuities:
    • Discontinuity of collision
    • Discontinuity of slip / stick (friction)
    • Discontinuity of shapes (signed distance function). Hydroelastic, in principle, should be trying to address the third issue in the same spirit.
  4. It seems like it's okay that we optimize locally, but it would be super important to make it more concrete (the more non-convex the dynamics become, the less effective local methods become, and methods like CEM / MPPI become more successful)
hjsuh94 commented 2 years ago

TODOs

  1. Find out why our gradients are bad.

    • If there is a better numerical method to compute the gradient, then we should do it in order to make the comparison fair with the zero order method.
    • If we're truly up against some high gradients, it would likely indicate that we're up against some point of non-differentiability. We should understand where this comes from, and make an illustrative example to help our understanding. The fundamental difficulty of computing such a gradient might be a point for the zero-order method.
  2. Start implementing a second-order MBP for the existing examples.

  3. Complete the experiment section on the smooth examples.

    • Since many of the concerns (second order vs. MBP, hydroelastic vs. point-contact) do not apply to smooth examples, it would be nice to completely write down the results.

Concerns

  1. What is the point of our work? It is to understand. But where do we cut off this chunk of understanding to carve out as a single paper?

    • If iRS-LQR is our main contribution, why do we spend so much time investigating other potential models for contact?
    • If the unique combination of iRS-LQR and quasistatic sim results in a niche area that excels compared to other alternatives. (iRS-LQR and second order sim / quasistatic sim and iLQR), it makes sense for us to experimentally verify this.
    • However, it is unclear whether or not any of this will fit into 8 pages. One alternative might be to do publish something as a conference paper, and leave the rest as a journal paper. Russ will probably advise us when it comes to exchanging drafts.
  2. If we compare a second order model, would we need to go directly from forces? or put a position controller in the loop for the dynamics? Is that a fair comparison?

    • If could do both where input is force vs. input is position command to a PD controller.
    • The PD controller would have gains equal to the position controller of the quasidynamic sim, while D term can be set to be critically damping.
  3. Should we be concerned about hydroelastic at all? It seems tangential to the paper as long as we stick to sphere geometries.

hjsuh94 commented 2 years ago

Story of the Work: View from Planning through Contact

  1. Traditionally, people have been using second-order formulation of dynamics with smoothing schemes (of either forces or the constraints) to tackle planning through contact problems efficiently.
  2. iRS-LQR works well on usual second-order formulation of dynamics using penalty method, and produces effects similar to works that try to explicitly smooth out the forces.
  3. We believe that quasi dynamic formulation of dynamics has a lot to contribute over traditional second order methods to do trajectory optimization, by allowing us to simulate for longer timesteps and getting more stable gradients over those long timesteps.
  4. However, unlike explicit smoothing forces, we don't really know how to make an explicit smooth approximation of dynamics defined by implicit programs and constraints (TODO: how does Michael's scheme fit in here?)
  5. iRS-LQR can deal with this case quite effectively by smoothing via sampling, and no other literature up to date has provided an alternative scheme that combines the benefits of smoothing and the benefits of quasistatic.
  6. The resulting combination of using iRS-LQR with quasistatic sim improves over the classic second-order formulation for difficult manipulation problems.
  7. The zero-order nature of iRS-LQR might relieve some of the tension in contact-based simulation that trades-off physical realism for gradients.

In this line of story, we have following things to prove:

  1. iRS-LQR improves second-order trajectory optimization over naive gradients.
  2. iRS-LQR improves quasidynamic trajectory optimization over gradients.
  3. iRS-LQR with quasidynamic trajopt is better than iRS-LQR with second-order trajopt (can also be established as a computational efficiency argument)

Story of the Work: View from Non-smooth Optimization

  1. Naive gradient descent fails for the simplest cases of non-smooth problems, so we shouldn't expect strategies like SQP to work very well.
  2. Randomized smoothing is a strategy that can tackle the inherent non-convexity and non-smoothness in the problem.
  3. Unlike existing works in planning-through-contact, iRS-LQR attempts to not only mitigate the non-smoothness of the problem, but to some extent, the non-convexity of the problem as well.
  4. There are problems in which exact gradient descent deterministically fails, while iRS-LQR can recover from bad initialization probabilistically.

Story of the Work: View from Reinforcement Learning

  1. Many papers in Model-based reinforcement learning / policy search uses a probabilistic formulation of dynamics, that we believe are being used to relax some of the hardness of the deterministic formulations that we had (i.e. we knew that the expected value is a better cost for a long time)
  2. So what does a stochastic modeling of the system offer us - what aspect of it makes the problem easier, and does it really make sense for non-smooth and/or combinatorial problems with contact dynamics?
  3. We start by analyzing the case where probability is injected into a deterministic non-smooth system in order to make a smoother approximation to the system. (NOTE: the question could perhaps be divided into where we inject noise, and where we take the expected value. For policy search in MDP it's often a total expectation over randomness in initial condition, dynamics, and the policy. Here we just do it over dynamics)
  4. Policy search does a lot of zero-order and optimization (e.g. policy gradient trick), and we also want to know if this has some fundamental benefits over using true gradients of the simulation by avoiding subtle cases like the Heaviside function / staircase effect.