Closed hhhzf0408 closed 1 year ago
Thank you for your interest in our work. From the paper: During learning, the robot executes the same exploratory motion for each of the five object positions. The cost for this exploratory motion is then the average of these five trials. We, thus, use the expected cost of an exploratory motion. For instance, if the robot successfully grasped three out of five objects, the cost for failed grasping for this exploratory motion is (0 + 0 + 0 +1 + 1)/5 = 0.4 Can you let me know which part of this is not clear?
and then repeats the above experiment after a reinforcement learning
What do you mean with "experiment"? A episode/trial? A batch of episodes/trials for an update? Running one reinforcement learning process with N updates?
until the robot arm successfully grabs the object
As a result of the reinforcement learning process, the robot successfully grasps the objects at all positions they are placed (some errors remain)
If so, is it possible for the mechanical arm to shift, explore in the opposite direction,
It is not at all clear to me what this sentence means.
Thank you for your reply. I understand what you mentioned (0+0+0+1+1)/5=0.4. At present, I'm not sure about the following points
About the position of the water cup: The initial position(the perceived position)of the water cup is given initially, and the DMP is initialized with this initial position as the end point. In the process of training, the position of the water cup is not at the initial position, but randomly appears at a certain position around the initial position, and the position is fixed, until the successful capture, then it will switch to the next random position. Is my understanding of the above part correct?
About Pic 10: "The object is placed {−6, −4, 0, 4, 6} cm from the perceived object position along either the x- or y-axis." Can I understand this sentence as that the water cup is placed on the x-axis or y-axis, respectively at the five positions of - 6, - 4, - 0, 4, and 6 for experiments? I don't quite understand the meaning of the two uncertainty distributions, so I don't understand Figure 10. If it's convenient, could you please explain it in detail.
I have been stuck in understanding relevant issues for a long time,Thank you for your help.
- About the position of the water cup: but randomly appears at a certain position around the initial position,
Not randomly. Each of the five positions is offered once. The order does not matter, so it is fixed.
and the position is fixed, until the successful capture, then it will switch to the next random position.
No. 0 + 0 + 0 + 1 +1 means the grasp succeeded on the first three positions and failed on the last two.
- About Pic 10: "The object is placed {−6, −4, 0, 4, 6} cm from the perceived object position along either the x- or y-axis." Can I understand this sentence as that the water cup is placed on the x-axis or y-axis, respectively at the five positions of - 6, - 4, - 0, 4, and 6 for experiments?
Yes, this is what is written in the paper.
so I don't understand Figure 10
From the paper: "For each of the two uncertainty distributions (five positions aligned with either the x- or y-axis), three learning sessions were performed with ten updates per session." From the caption "Location of the goals g during learning [...] averaged over three learning sessions per uncertainty distribution." So the robot learns a movement for one of the two distributions. This learning is done three times for each distribution. I really don't know what to add to what is written in the paper to make it more clear.
I have read a lot of papers in this field. I generally know what kind of things I have done, but I still can't clearly understand what the experiment is doing, how I should design my own experiment, and what the practical significance of such an experiment is. I can't even express my specific doubts.
I'm sorry to have delayed your time. I also know that there is a problem with my own understanding. As a beginner, there are no people around me who study this direction, so I have no way to disturb you. Maybe I'm not suitable for this direction, and I will consider changing to another direction.
Thank you for your contribution to the study.
Thank you for your reply.
Reinforcement_Learning_With_Sequences_of_Motion_Primitives_for_Robust_Manipulation.pdf Hello, I want to ask you a question about your paper. Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation. I didn't quite understand the meaning of Experiment 1. Suppose the center of the water cup is at (0,0), and the center of the water cup is offset within a range due to some external interference. The initial DMP uses the (0,0) point as the target point to plan the path. The robot arm executes a grab, returns to the initial position after failure, and then repeats the above experiment after a reinforcement learning until the robot arm successfully grabs the object. My current understanding is as above. I think it should be wrong. According to the cost function, the optimization method is related to the success of grasping, the acceleration of the end point and the shape parameters. If so, is it possible for the mechanical arm to shift, explore in the opposite direction, and fail to execute successfully? As a beginner, I feel that there is a big deviation in my understanding, and I hope to get your correction and reply. Thank you.