tkkim-robot / smooth-mppi-pytorch

A pytorch implementation of Smooth Model Predictive Path Integral control (SMPPI)
MIT License
5 stars 3 forks source link

Discrepancies between the paper and the code #1

Open edvinmandrejev opened 1 day ago

edvinmandrejev commented 1 day ago

Hi

I have been trying to use your smooth MPPI paper (https://arxiv.org/pdf/2112.09988) for controlling robots. However, the performance I have been getting is very different from the paper. Specifically, I have the following doubts.

  1. I could not understand how you are updating the action from control. In line 84 (https://github.com/tkkim-robot/smooth-mppi-pytorch/blob/main/pytorch_mppi/smooth_mppi.py), the updated action is directly obtained by adding updated control. Shouldn't there be delta_t somewhere, as written in the paper.

  2. According to the paper, the control is the derivative of the action. But the control and action sequence generated by smppi code does not seem to be following this relationship. To verify this, I computed the control through finite difference of action. The finite difference gives widely different vector than the control sequence generated by smppi. What could I be missing here? This holds true even when the dt of dynamics or finite difference is small.

  3. In the paper, it is recommended to independently clip action and control (also done in the code). But then, how does it preserve the derivative relation between action and control.

tkkim-robot commented 18 hours ago

Hi,

Thanks for raising those concerns. I had a very similar question last time, and I will share them at this point. I hope most of you questions can be addressed by my answers from the past.

Q1: Where does \Delta t come in for Algorithm 1? It's not listed as a Given and isn't computed. I'm assuming that this is the control period for continuous time problems. What value do you use for your experiments? For the race car example, from "The feedforward control frequency was 10 Hz" I'm assuming you used 0.1s? What if the underlying problem is actually discrete time?

A1: Technically it should be 0.1s, but in implementation, I just use Delta t = 1, so no multiplication with delta t. If I set delta t as 0.1, I can almost get the same behavior with increasing the sampling variance. In a more concrete notation, I am using control space as \Delta of action (without \delta t), and just tune the sampling variance that is proper to each application. So the system can be discrete. I know it's confusing :( If I can go back to 3 years ago, I would clarify it in my paper.

Q2: As far as I understand, the core idea of the paper is to sample gaussian noise in the control time derivative space (now called the control space), integrate it over time to get the action space (now called the action space). Then, impose a smoothing cost on the action space. Both control and action space are kept and shifted forward to be treated as the nominal trajectories to sample around.

A2: Correct!

Q3: Do you need many iterations of warm-starting because the initial action sequence will be bad, and due to it being corrected only by integrating the control sequence, you'd need to do it iteratively to correct it significantly? How do you initialize your control and action sequences if they're not given? I'm assuming control is uniform noise while action is all 0?

A3: Yes. I just simply set the actions to be zero at the first stage, so it requires more warm starting from the initial stage than the original MPPI. Since my previous research is mostly focused on autonomous driving, the system state always starts with all zero-value (vx, vy, yaw_rate are zeros). And it always takes some time to accelerate, so there was no problem to set the actions to all zero. It was enough for my application, but yes it's a disadvantage.

tkkim-robot commented 18 hours ago

Let me add more explanations for your question 2 and question 3.

I'm afraid you are kind of misunderstanding the concept. When you say the control input along the "real-world" time horizon (in this case, the difference between action sent to robot at time t, and the one at time t+1), it is different from the input lifting done on SMPPI optimization horizon. The input lifting technique is applied in the mpc horizon. For clarify, I mentioned about t-axis and i-axis in the paper (it might be confusing, but I still think it's a good way of presenting this idea).

Regarding your question 3, it's a good point. There might be other better ways than clipping the control input directly based on the input constraints, but in practical implementations, many of the implementations still clip them directly (as I did in this case).

I suggest you to look at this SMPPI implementation as well.

https://github.com/UM-ARM-Lab/pytorch_mppi/tree/smppi

If you want to test SMPPI in this implementation, please set w_action_seq_cost to 0.05.