pranshumalik14 commented 2 years ago

These questions are specifically about utilizing discrete DMPs (implemented in this repository) for the reaching task, especially in the context of this paper (https://onlinelibrary.wiley.com/doi/abs/10.1111/desc.12638).

For multi-joint arms, what is the input trajectory for training?
- We noticed the code (in the above repository) and your proximodistal learning paper seem to follow different approaches. The code involves trajectories in world coordinates and the paper involves trajectories in joint angles. Therefore, in order to use your code, is it accurate if we provide joint angles as inputs? We would not like to use inverse kinematics during the movement of the robot (output) since the purpose of our project is to develop an alternative method for arm reach, based on human-learning processes.
We were assuming that only the end-effector trajectory is passed in for training the DMPs. However, in that case, when are joint trajectories calculated? Or are joint trajectories known initially?
Is the forcing function in the paper (and its MATLAB code) the same as joint acceleration in the paper, section 3.1.3? If so, then can we integrate the output and send joint positions (angles) to the robot to follow?

Any help or pointers on this will be highly appreciated!

stulp commented 2 years ago

Thank you for your question. The Developmental Science paper had a very different aims from this specific repository, which may indeed be confusing. Let me try to clarify.

Robotics

The code in the current dmpbbo GitHub repo focuses on robotics, and the focus is on:

Policy Dynamic Movement Primitives (DMPs)
- DMPs are dynamical systems which have an attractor which you can change, which is convenient in goal-directed movements for robotics (attractor = goal)
Output (State) The state representation in a DMPs is generic and not specified
- It can thus represent the pos/vel/acc of a joint, an end-effector, or the desired temperature in room (though the latter would not be an obvious thing to do). Although dmpbbo does not limit what it can represent, on a real robot one usually prefers end-effector trajectories (task-space generalization is better), which are then converted during execution to joint pos/vel/acc through inverse (differential) kinematics.
Initialization Initializing a trajectory from a demonstration
- When learning on real robotics, it's important to have a "warm start" for RL

Developmental Science article

The Developmental Science article has a different focus, i.e. on modelling how infants learn to reach towards a certain position from scratch. The resulting differences to the robotics approach are:

Policy Since the aim is learning how to reach to a certain position, the policy should not have an attractor to that position (as in a DMP), because then there's nothing to learn anymore. Therefore, the policy is not a dynamical system with an attractor, but "just" a radial basis function network (equation 4)
Output (State) The policy generates accelerations in joint space (see equation 3). The underlying assumption is that infants have not yet learned inverse kinematics yet.
Initialization There is no initial trajectory. As mentioned in the article "The initial parameter vector is θ_init = 0, which means the arm is completely stretched, and not moving at all over time." Thus, the algorithm starts "from scratch" (as an infant), rather than from a demonstration.

Essentially, the assumptions one makes for robotics (attractor state as a goal, end-effector space generalizes better, warm start is good for optimization) are dropped in the Developmental Science work, because here we want to reduce the number of required "auxillary hypotheses" as much as possible to explain the observed data (the proximo-distal freezing/freeing over time).

Matlab implementation vs Python/C++ implementation.

As to the code, the results in that article were generated with an older version of the code which I implemented in Matlab. Later on I ported this to C++ (for real-time execution) and Python (mainly because I wanted to learn Python). A snapshot of the Matlab code (there is a link in the article) can be found here: https://github.com/stulp/dmp_bbo/archive/proximodistalmaturation.zip

With the Matlab code above, which is quite old by now, you should be able to reproduce the results from the paper right away. More fun for a student project would be to reimplement the Developmental Science paper in Python. To do that, the following classes/functions are relevant:

To represent the policy (don't use train(), but just rbfn.set modelweights=0) : https://github.com/stulp/dmpbbo/blob/master/python/functionapproximators/FunctionApproximatorRBFN.py
To run the optimization: https://github.com/stulp/dmpbbo/blob/master/python/bbo/runOptimization.py#L30
Use the updater that updates the covariance matrix based on the samples: https://github.com/stulp/dmpbbo/blob/master/python/bbo/updaters.py#L110
The task ViapointArm2D (used in the article) is currently only implemented in C++, it should be ported to Python: https://github.com/stulp/dmpbbo/tree/master/src/dmp_bbo/tasks
- Note that the conversion from joint angles to end-effector positions (forward kinematic) is done here: https://github.com/stulp/dmpbbo/blob/master/src/dmp_bbo/tasks/TaskSolverDmpArm2D.cpp#L101

If the resulting code is nice, we could add a directory demos_pdff to have the reimplementation in the new repo! (I would very much like that too, so I could provide some support for your project in case this is your aim)

pranshumalik14 commented 2 years ago

Thank you so much for such a wonderful breakdown of concepts involved! Yes, we have a compatible aim to be able to provide something like demos_pdff, and we are starting to work towards that. We will update this thread if we have any questions along the way, and will eventually link a PR to this issue. Thanks again for your kind support!

pranshumalik14 commented 2 years ago

Hi @stulp,

We noticed that in the developmental science paper code (MATLAB), the maturation solver uses linearpolicyintegrate instead of the weighted and normalized transformationintegrate to get the net activation/force term to then integrate the DMP states (joint angles). Equation 4 in the paper suggests using something similar to the latter. Is there something we are missing, or is this information accurate and it is fine if we follow this?

Theoretically, we don't think normalization would matter since in the end, we learn how much joint activation we need at a particular time and with/without normalization should just be a scaling difference.

We would really appreciate it if you could let us know if we are on the right track. Thanks!

pranshumalik14 commented 2 years ago

We are nearly there in understanding the basic details of the paper and its implementation, and just have another question; this time about the PI-BB algorithm. We were in the understanding that the output parameter, Θ, is a matrix with each row having weights for the radial basis functions for the respective joint (column number). In PI-BB, however, we see that we are optimizing per column vector of this matrix, i.e., per joint. If this is the case, then how are the roll-outs (and subsequent evaluations) optimizing for the "best" activation weights for all joints together? Meaning, which parameter vectors are chosen for rollouts for the joints other than the one being optimized, or are all joints undergoing the same procedure in parallel and the current sample for each column/joint is used for each roll-out? In that case, why would the costs be restricted to each joint in the optimization procedure and not look at the rest of the costs/joint to "communicate between each other"?

Any clarification on this will also be highly appreciated. We have attached a snapshot of the algorithm below.

stulp commented 2 years ago

We noticed that in the developmental science paper code (MATLAB), the maturation solver uses linearpolicyintegrate instead of the weighted and normalized transformationintegrate to get the net activation Theoretically, we don't think normalization would matter since in the end, we learn how much joint activation we need at a particular time and with/without normalization should just be a scaling difference.

Indeed. In practice, that should not matter at all. This introduces a small and irrelevant scale factor. Small because the max value of the kernels and the normalized kernels is both 1, and when evenly spaced, the sum over the former will usually not exceed that of the latter much. And this arbitrary scale factor is hardly relevant in the context of the experiments in the article, because we initialize the policy parameters (theta) with 0.

force term to then integrate the DMP states (joint angles).

Careful, there are no "forcing terms" and "DMP states" in the article, and they are not used in the code for the article. All that is used is a simple basis function policy that return accelerations. The internal (trivial) "simulator" then computes the velocities and positions from this acceleration with Euler integration.

pranshumalik14 commented 2 years ago

Understood that, thanks a lot! With the other question, we should be on track for complete implementation.

stulp commented 2 years ago

Your question about PI-BB is excellent!

Updating the mean

Suppose you have 7 joints and 5 basis functions per joint. Then there are 35 policy parameters to optimize. You can then do two things:

make 1 parameter vector with 35 values
make 7 parameter vectors with 5 values

And when you update the parameter vector with the same costs, it turns out that it boils down to same math, and they are equivalent. So optimizing them all in one vector, or parallel in 7 parameter vectors is the same. Of course assuming you use the same costs for all joints. The joints "communicate" through the costs, so to speak.

And note that that is a stochastic process! One joint may have done something "dumb", and the other joint something "smart". If the smart thing was smarter than the dumb thing was dumb, then it will have a lower cost, and both will be rewarded. That may be unfair that the dumb joint gets a lower cost also, but since the smarter one was smarter than the dumb one was dumb, in effect it goes in the right direction. I hope this makes sense ;-)

Updating the covariance matrix

Updating the covariance matrix is a different story though. Here the two options do something different:

the covariance matrix is 35x35 = 1225 parameters
the covariance matrix is 7x5x5 = 175 parameters

The matlab code maintains a different distribution for each "joint". Here is the loop over the distribution, where n_dof would be 7 in our example.

for i_dof=1:n_dofs
  covar = distributions_new(i_dof).covar;

This implementation is generic enought to allow both variants (1 distribution of 35x35, or 7 distribution of 5x5)

Note that the first covariance matrix can thus correlate exploration between joints, but the second cannot. One may certainly find examples where this matters. But so far, it hasn't really seemed to matter in practice. Apparently, for a good optimization it is more important to have more smaller, stable covariances matrices than one big one with many more parameters, than to be able to model covariances between joints. This is pragmatism speaking here.

To answer your questions

then how are the roll-outs (and subsequent evaluations) optimizing for the "best" activation weights for all joints together?

Because you take the weighted mean over each parameter, and this operation is independent for each parameter anyway. So it does not matter how you organize the parameters. It is the cost that is shared between all parameters, and that is what matters.

Meaning, which parameter vectors are chosen for rollouts for the joints other than the one being optimized, or are all joints undergoing the same procedure in parallel and the current sample for each column/joint is used for each roll-out?

Correct.

In that case, why would the costs be restricted to each joint in the optimization procedure and not look at the rest of the costs/joint to "communicate between each other"?

As you probably understand now, this is not the case. To drive the point home, when updating the mean of the distribution, there is not even "communication" between the different individual parameters!

Excellent questions, really. It's nice to chat about this, because few people have dug this deep ;-) Tell your teacher you should get a great mark!

stulp commented 2 years ago

So I've not looked at the Matlab code in a long time. But I think the C++ highlights nicely the parallel optimization process.

https://github.com/stulp/dmpbbo/blob/ca900e3b851d25faaf59ea296650370c70ed7d0f/src/dmp_bbo/runOptimizationTask.cpp#L298

It's notable how the evaluation loop loops over samples, but not over the parallel optimization in the "joints". That is because there is only one cost per sample, which is the same for all the distributions ("joints") in the optimization:

https://github.com/stulp/dmpbbo/blob/ca900e3b851d25faaf59ea296650370c70ed7d0f/src/dmp_bbo/runOptimizationTask.cpp#L306

I made that function deprecated a long time ago, because I figured out it can be implemented much more easily and compact. I should do that some time: #59

pranshumalik14 commented 2 years ago

Really appreciate your detailed answers, thanks so much again for making everything so clear! We'll help with issue #59 once our python implementation is neat and compact.

pranshumalik14 commented 2 years ago

Hi @stulp we have created an interactive demo on a Pluto notebook in Julia language here (https://pranshumalik14.github.io/intuitive-arm-reach/notebooks/pdff_impl.jl.html), with some minor modifications and some explanation. This can be run on the cloud or locally as well and allows for tweaking with all the involved parameters. A python version for our project is underway and will be used for extending this idea to a learned internal model.

Regarding the PDFF/PIBB implementation, we have tried to make it as compact and legible as possible.

stulp commented 2 years ago

Very nice notebook demo! Looking forward to the Python version.

Your conclusion is interesting: "We also show the resulting joint angle evolutions from which it is easy to observe that a naturalistic movement has been generated, first involving rotation of the proximal joints followed by the distal joints."

The PDFF effect as described in the journal article refers to the development of exploration within different joints during learning rather than then movements of the joints themselves during the learned motion. The effect you observe is thus different from the one in the article, but also very interesting.

pranshumalik14 commented 2 years ago

Thank you, and yes, the Python version is under development in the context of physical robots! We're also trying to interpolate the weights by training a model so that we don't have to run PDFF from scratch each time. Currently we have a simple Radial Basis Function MLP with some convolutional layers, just to try out this approach. If you have any suggestions in this regard, we would be all the more grateful!

What you pointed out is true, that behavior is used for self exploration during learning only, but like in the notebook, more often than not we also get the same bias as an output. This is especially true, we think when such a motion sequence can lead to the goal (quite likely), but some goals such as those really close to a particular joint will almost always result in earlier and higher activation of distal joints.

stulp / dmpbbo

DMP Implementation for PDFF Exploration #58

Robotics

Developmental Science article

Matlab implementation vs Python/C++ implementation.

Updating the mean

Updating the covariance matrix

To answer your questions