stulp / dmpbbo

Python/C++ library for Dynamical Movement Primitives and Black-Box Optimization
GNU Lesser General Public License v2.1
226 stars 89 forks source link

May I ask a theoretical question #79

Closed wzy-hhu closed 2 years ago

wzy-hhu commented 2 years ago

Hello, your code is very excellent. I'm a beginner of DMP algorithm. I don't understand some contents very thoroughly. If it's convenient, I want to ask you about the specific significance of noise exploration.

The essence of DMP is teaching and learning, which allows us to modify the target position and generalize the teaching track. For example, for the problem of throwing a ball mentioned in your code, I try to take out the initial trajectory of the ball motion in your code. Without noise exploration, I use the original DMP to directly modify the target point, and it seems that the resulting trajectory can also pass through the target point.

Including in the papers in recent years, I have seen some people say that they need to use the method of reinforcement learning θ and target point g. Otherwise, if you change the target location only using DMP, you will not be able to reach the target point. I don't really understand the role of Reinforcement Learning here.

In my understanding, DMP that uses noise to explore can set cost according to different targets, such as passing a point, or minimizing the acceleration of the target point, which can't be done by the original DMP. Suppose I combine DMP with an actual robot to complete tasks similar to grasping. If I do not control the speed, acceleration and other information to reach the target point, although theoretically DMP will reach the target point, due to the large acceleration and speed, it may not accurately stop at the point we want, or even damage the equipment. Is there a problem with my understanding?

Thanks very much.

stulp commented 2 years ago

Hello, your code is very excellent.

Thanks for the feedback!

I don't understand all your questions in detail:

Without noise exploration, I use the original DMP to directly modify the target point, and it seems that the resulting trajectory can also pass through the target point.

Yes, because convergence to the goal is an inherent property of DMPs, not of optimizing its parameters.

Including in the papers in recent years, I have seen some people say that they need to use the method of reinforcement learning θ and target point g. Otherwise, if you change the target location only using DMP, you will not be able to reach the target point. I don't really understand the role of Reinforcement Learning here.

Me neither ;-) Which papers are you referring to?

If I do not control the speed, acceleration and other information to reach the target point, although theoretically DMP will reach the target point, due to the large acceleration and speed, it may not accurately stop at the point we want, or even damage the equipment. Is there a problem with my understanding?

That is correct. The DMP will theoretically converge to the target g. But if you pass accelerations of 10000 rad/s² to your robot joints on the way to g, it will explode before reaching it... That is why the license for dmpbbo includes a "no warranty" statement. It's up to the user to make sure that this does not happen.

wzy-hhu commented 2 years ago

Thank you for your reply, which answered many of my questions.

Regarding the paper, for example, in "D. Reinforcement Learning With Optimal Target" in paper "DMP-Based Motion Generation for a Walking Exoskeleton Robot Using Reinforcement Learning", the author mentions, "However, constraints between joints can cause positional disturbances of the goal; so, we need to learn the goal exploration by the DMP parameter optimization."

In "Reinforcement Learning of Dual-Arm Cooperation for a Mobile Manipulator with Sequences of Dynamical Movement Primitives" mentions,"However, the uncertain external perturbations, errors of the feedback information, and little knowledge about the environment might cause the motion sequences deviated from the desired trajectories." / "In the learning process, the recorded joint trajectories on the first trial is used to initialize the DMPs, and then reinforcement learning (RL) improvement with Path Integrals (PI2) is used to learn the manipulation uncertainty information by interacting with environment and to adjust the DMPs base on the information."

For another example, in the fifth part of the paper "Reinforcement Learning of Manipulation and Grasping Using Dynamical Movement Primitives for a Humanoidlike Mobile Manipulator", it is mentioned that "However, that planning might not be successful in practice due to uncertainty perturbations, even if theoretical grasps and motion planning are successful. Nevertheless, the DMPs can be employed to generate goal directed movements with generalization and learning either in joint or task space due to its capability of being robust to those perturbations."

At present, I have only experimented in the simulation environment. I record the motion track of the end effector, and input the recorded track to the manipulator in the simulation environment again. Eventually, I can move smoothly to the target point, but the middle track is slightly different. If the deviation of the intermediate trajectory is interpreted as the uncertainty disturbance of the manipulator mentioned by the author, I still don't understand why the target position g is also disturbed.

In my opinion, DMP can accurately reach the target point, so it is not the DMP algorithm that is affected by the disturbance, but the planned trajectory that cannot be correctly executed due to the disturbance of the manipulator itself. Why does the paper say that DMP algorithm is robust to these disturbances? I think it's my own understanding that has gone wrong.

Sorry for taking up your time, and thank you very much.

stulp commented 2 years ago

It would help if you included links to those papers. Ideally to freely accesible PDFs.

At present, I have only experimented in the simulation environment. I record the motion track of the end effector, and input the recorded track to the manipulator in the simulation environment again. Eventually, I can move smoothly to the target point, but the middle track is slightly different. If the deviation of the intermediate trajectory is interpreted as the uncertainty disturbance of the manipulator mentioned by the author, I still don't understand why the target position g is also disturbed.

The dynamical system guarantees convergence to g when when t approaches infinity. Luckily you don't always have to wait that long ;-) If the value of tau for the trained dmp is for instance 2 seconds, try and integrate the system longer (e.g. 3 seconds). By that time, the output of the DMP should have converged to g. Otherwise there is an issue with the code somewhere.