stulp / dmpbbo

Python/C++ library for Dynamical Movement Primitives and Black-Box Optimization
GNU Lesser General Public License v2.1
224 stars 90 forks source link

a question about your paper #82

Closed hhhzf0408 closed 1 year ago

hhhzf0408 commented 1 year ago

Reinforcement_Learning_With_Sequences_of_Motion_Primitives_for_Robust_Manipulation.pdf Hello, I want to ask you a question about your paper. Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation. I didn't quite understand the meaning of Experiment 1. Suppose the center of the water cup is at (0,0), and the center of the water cup is offset within a range due to some external interference. The initial DMP uses the (0,0) point as the target point to plan the path. The robot arm executes a grab, returns to the initial position after failure, and then repeats the above experiment after a reinforcement learning until the robot arm successfully grabs the object. My current understanding is as above. I think it should be wrong. According to the cost function, the optimization method is related to the success of grasping, the acceleration of the end point and the shape parameters. If so, is it possible for the mechanical arm to shift, explore in the opposite direction, and fail to execute successfully? As a beginner, I feel that there is a big deviation in my understanding, and I hope to get your correction and reply. Thank you.

stulp commented 1 year ago

Thank you for your interest in our work. From the paper: During learning, the robot executes the same exploratory motion for each of the five object positions. The cost for this exploratory motion is then the average of these five trials. We, thus, use the expected cost of an exploratory motion. For instance, if the robot successfully grasped three out of five objects, the cost for failed grasping for this exploratory motion is (0 + 0 + 0 +1 + 1)/5 = 0.4 Can you let me know which part of this is not clear?

and then repeats the above experiment after a reinforcement learning

What do you mean with "experiment"? A episode/trial? A batch of episodes/trials for an update? Running one reinforcement learning process with N updates?

until the robot arm successfully grabs the object

As a result of the reinforcement learning process, the robot successfully grasps the objects at all positions they are placed (some errors remain)

If so, is it possible for the mechanical arm to shift, explore in the opposite direction,

It is not at all clear to me what this sentence means.

hhhzf0408 commented 1 year ago

Thank you for your reply. I understand what you mentioned (0+0+0+1+1)/5=0.4. At present, I'm not sure about the following points

  1. About the position of the water cup: The initial position(the perceived position)of the water cup is given initially, and the DMP is initialized with this initial position as the end point. In the process of training, the position of the water cup is not at the initial position, but randomly appears at a certain position around the initial position, and the position is fixed, until the successful capture, then it will switch to the next random position. Is my understanding of the above part correct? `{TRHKJCPRMWD)2XR`8~1}B

  2. About Pic 10: "The object is placed {−6, −4, 0, 4, 6} cm from the perceived object position along either the x- or y-axis." Can I understand this sentence as that the water cup is placed on the x-axis or y-axis, respectively at the five positions of - 6, - 4, - 0, 4, and 6 for experiments? I don't quite understand the meaning of the two uncertainty distributions, so I don't understand Figure 10. If it's convenient, could you please explain it in detail. %UTJX4Q~U({S4V{MG$DS}AD

I have been stuck in understanding relevant issues for a long time,Thank you for your help.

stulp commented 1 year ago
  1. About the position of the water cup: but randomly appears at a certain position around the initial position,

Not randomly. Each of the five positions is offered once. The order does not matter, so it is fixed.

and the position is fixed, until the successful capture, then it will switch to the next random position.

No. 0 + 0 + 0 + 1 +1 means the grasp succeeded on the first three positions and failed on the last two.

  1. About Pic 10: "The object is placed {−6, −4, 0, 4, 6} cm from the perceived object position along either the x- or y-axis." Can I understand this sentence as that the water cup is placed on the x-axis or y-axis, respectively at the five positions of - 6, - 4, - 0, 4, and 6 for experiments?

Yes, this is what is written in the paper.

so I don't understand Figure 10

From the paper: "For each of the two uncertainty distributions (five positions aligned with either the x- or y-axis), three learning sessions were performed with ten updates per session." From the caption "Location of the goals g during learning [...] averaged over three learning sessions per uncertainty distribution." So the robot learns a movement for one of the two distributions. This learning is done three times for each distribution. I really don't know what to add to what is written in the paper to make it more clear.

hhhzf0408 commented 1 year ago

I have read a lot of papers in this field. I generally know what kind of things I have done, but I still can't clearly understand what the experiment is doing, how I should design my own experiment, and what the practical significance of such an experiment is. I can't even express my specific doubts.

I'm sorry to have delayed your time. I also know that there is a problem with my own understanding. As a beginner, there are no people around me who study this direction, so I have no way to disturb you. Maybe I'm not suitable for this direction, and I will consider changing to another direction.

Thank you for your contribution to the study.

Thank you for your reply.