wbm_getState return nan in the vector-quaternion transformation

VModugno commented 7 years ago

Hi guys,

right now I'm extensively using the mex_wholebody toolbox to perform a trajectory optimization task. In order to find a suitable solution i need to repeat the simulation of the experiment several times and early stops in the simulation can occur due to numerical instability. My code starts by calling wbm_modelInitialize('icubGazeboSim') and then i perform all the subsequent operations on that instance of the iCub model. If one repetition of the simulation ended for numerical instabilty, after that, wbm_getState becomes incosistent and starts to return nan in the vector-quaternion transformation and the only way to fix this issue is to close matlab and open it again. I observed this issue on both my machines and if you need to reproduce the problem i can provide a code snippet. Thank you for your help

@serena-ivaldi

traversaro commented 7 years ago

Can you attach to this issue the code to reproduce the code? Thanks!

traversaro commented 7 years ago

Can you attach to this issue the code to reproduce the code? Thanks!

VModugno commented 7 years ago

I modified one of your controller example inside the mex-wholebodymodel/Controllers folder in codyco-superbuild/main. first of all copy the TorqueBalancingError folder inside the controller folder. To generate the error just launch initializeTorqueBalancing.m once. it will end with an error. Than launch it again and check through a break point in wbm_getWorldFrameFromFixLnk.m on line 37 the value of ow_vqT_b from wbm_getState() in computeNewWorld2Base(). you should have a set of nan.

torqueBalancingError.tar.gz

traversaro commented 7 years ago

Hi @VModugno, as you may imagine dealing with such a tarball is quite difficult.

If you could commit your code to a git repository, even just a branch on your personal fork of this repo, it would much easier to understand the changes that you applied on the code, thanks a lot!

Anyhow, @gabrielenava probably has some insight on what could be the cause of the problem.

Ganimed commented 7 years ago

Hi @VModugno,

I will check the error on the weekend. At the moment I have not much time for coding, because the begin of the next week the house of my grandmother will be teared down with a caterpillar. So the last three weeks were tough to prepare the house.

To overcome the situation in the meantime, try to call clear mexWholeBodyModel or clear all in between, before you repeat the simulation again. This would save time instead of restarting Matlab. @VModugno is your code stored in the learnOptimWBC repository?

Ganimed commented 7 years ago

@VModugno: I have copied your folder torqueBalancingError into the path codyco-superbuild/main/controllers. Then I called with Matlab R2016b the method initializeTorqueBalancing. But in Matlab R2016b I get a complete different error as you have described it (see below). At the moment I'm not able to reproduce your error. But I have also to mention, that at the moment I'm using in the C++ code "yarpWholeBodyModelV1.h" instead of new version V2. The corresponding header-file is located under: codyco-superbuild/libraries/yarpWholeBodyInterface/include/yarpWholeBodyInterface/yarpWholeBodyModel.h. On the weekend I will recompile the C++ code with yarpWholeBodyModelV2.h. Maybe afterwards I'm able to reproduce the your error.

> In runController (line 33)
  In forwardDynamics (line 64)
  In initForwardDynamics>@(t,chi)forwardDynamics(t,chi,CONFIG) (line 70)
  In euleroForward (line 26)
  In initForwardDynamics (line 74)
  In initializeTorqueBalancing (line 144) 
Warning: Matrix is singular, close to singular or badly scaled. Results may be inaccurate. RCOND = NaN. 
> In stackOfTaskController (line 48)
  In runController (line 46)
  In forwardDynamics (line 64)
  In initForwardDynamics>@(t,chi)forwardDynamics(t,chi,CONFIG) (line 70)
  In euleroForward (line 26)
  In initForwardDynamics (line 74)
  In initializeTorqueBalancing (line 144) 
Warning: Matrix is singular, close to singular or badly scaled. Results may be inaccurate. RCOND = NaN. 
> In stackOfTaskController (line 52)
  In runController (line 46)
  In forwardDynamics (line 64)
  In initForwardDynamics>@(t,chi)forwardDynamics(t,chi,CONFIG) (line 70)
  In euleroForward (line 26)
  In initForwardDynamics (line 74)
  In initializeTorqueBalancing (line 144) 
Warning: Matrix is singular, close to singular or badly scaled. Results may be inaccurate. RCOND = NaN. 
Error using svd
Input to SVD must not contain NaN or Inf.

Error in pinv (line 18)
[U,S,V] = svd(A,'econ');

Error in stackOfTaskController (line 94)
    pinvA  = pinv(A,pinv_tol);

Error in runController (line 46)
controlParam           = stackOfTaskController(CONFIG,gain,trajectory,DYNAMICS,FORKINEMATICS,STATE);

Error in forwardDynamics (line 64)
controlParam    = runController(gain,trajectory,DYNAMICS,FORKINEMATICS,CONFIG,STATE);

Error in initForwardDynamics>@(t,chi)forwardDynamics(t,chi,CONFIG) (line 70)
forwardDynFunc        = @(t,chi)forwardDynamics(t,chi,CONFIG);

Error in euleroForward (line 26)
    chi(:,k) = chi(:,k-1) + tstep.*func(t(k-1),chi(:,k-1));

Error in initForwardDynamics (line 74)
    [t,chi]           = euleroForward(forwardDynFunc,chiInit,CONFIG.tEnd,CONFIG.tStart,CONFIG.sim_step);

Error in initializeTorqueBalancing (line 144)
initForwardDynamics(CONFIG);

gabrielenava commented 7 years ago

Don't worry @ganimed, take all the time you need!

My suggestion was exactly the same thing @ganimed suggested. The function wbm_getState should be a "reader" of the robot state previously stored using other functions (such as wbm_updateState). So, first thing is to verify that you are always correctly updating the robot state before calling wbm_getState, otherwise it may returns you NaN. I remember I implemented this "updateState" feature in the controller, but I will double check this anyway. Second thing is, try to use clear all: in fact clear alone is not able to erase some variables (e.g. persistent or global) that might be called inside the library.

As @traversaro suggested, feel free to fork this repository or create a branch (if you have permissions): this also because I'm going to merge into master branch some features I recently implemented, and it may be a problem for you to always adapt your code for keeping it update with the main code.

gabrielenava commented 7 years ago

@Ganimed this error seems related to the integrator. Can you please check if the option CONFIG.integrateWithFixedStep in initializeTorqueBalancing.m is setted to 1?

gabrielenava commented 7 years ago

I got the answer myself, and the answer is: yes! In fact in the error you posted below I see the function euleroForward is called instead of ode15s:

Error in initForwardDynamics (line 74)
    [t,chi]           = euleroForward(forwardDynFunc,chiInit,CONFIG.tEnd,CONFIG.tStart,CONFIG.sim_step);

So, the error @Ganimed had is definitely related to this. Please consider that, as it is pointed out in initializeTorqueBalancing.m, the option CONFIG.integrateWithFixedStep is an advanced option only for code developers, as the default integrator is ode15s which is able to deal with "stiff" problems. The fixed step integrator is used for debugging, since it may happen that ode15s gets stuck for some reasons and it is difficult to understand whether the problem is the robot is having unstable behaviours or the integration is just slow. Integrate with fixed step is a way to "make the integration faster", but it may leads to the error you posted above. So it must be used after setting a proper integration step and after adding a "desingularization" of system's mass matrix (that is done automatically if the fixed step option is ON, and actually it turns out that this desingularization acts exactly as including the motors inertia into the joints dynamics, but this is another story). Also consider that in the future release this debugging option has been deprecated and an "online visualizer" is used instead for triggering unstable behaviours.

VModugno commented 7 years ago

i will fork the mex-wholebody repo and i will add to the fork the code in the tarball. Currently im using one of the latest version of the code (i pulled the code the 20 of march). To generate the error i used one of the controller in mex-wholebody to be sure that all the operation were done in the right order. The only things that i have changed is related to the trajectory generation. Sorry @traversaro when you told me to attach the code to the issue i took it literally :) . BTW @traversaro @Ganimed @gabrielenava thank you for your help

traversaro commented 7 years ago

Sorry @traversaro when you told me to attach the code to the issue i took it literally :)

No problem!

Ganimed commented 7 years ago

@VModugno you can also try the latest version of the WBM-Class branch from the mexWholeBodyModel-repository. For a month ago I made in this branch some small corrections in the C++ code of the mex-subroutine. These corrections are not merged yet into the master. Try also to swap back to the old "yarpWholeBodyModelV1.h" as I have described it above and recompile everything. Maybe this will change the result. In the last weeks/months there were made a lot of changes and improvements in the yarpWholeBodyInterface which caused some issues on my code. So at the moment I went back temporarily to the old version until the changes in the new version are completed.

I fully agree with @gabrielenava. Try to use the ode15s method from Matlab. It is wayout more robust than the experimental euleroForward function. Check also in your code all your called _wbmgetState and _wbmupdateState methods, i.e. if you have placed the input and output variables in the correct order. In the new version the output order of _wbmgetState has been changed to [vqT_b, q_j, v_b, dq_j].

@gabrielenava: It is recommendable to make some notes in the descriptions, that the euleroForward function is still experimental, especially if the CONFIG.integrateWithFixedStep is activated, and only advanced users or developers can try to use this function with this configuration. All other people, i.e. normal users, should use for their simulations/experiments at first the ode15s method.

VModugno commented 7 years ago

I guys, i create an ad-hoc repo where you can find a working example to reproduce the issue. Just run initializeTorqueBalancingErr.m once. It will fail. Then in the second run check what happen in wbm_getWorldFrameFromFixLnk.m. Here is the repo https://github.com/VModugno/test_code

gabrielenava commented 7 years ago

I tried your code, and I discover I have a license problem:

To use 'sym', the following product must be both licensed and installed:
  Symbolic Math Toolbox

Error in AdHocTraj (line 9)
   t = sym('t');

Error in initializeTorqueBalancingErr (line 103)
CONFIG.references = AdHocTraj(time_struct,start_param);

In the lab our Matlab license doesn't have the "symbolic toolbox"...I'm going to test the code in my personal computer at home then.

VModugno commented 7 years ago

i updated the repo, i removed the symbolic toolbox dependency now it should be fine

Ganimed commented 7 years ago

I can't download anything. Is the given link broken?

VModugno commented 7 years ago

That s the working link https://github.com/VModugno/test_code

Ganimed commented 7 years ago

Hi @VModugno,

I have downloaded and tested your repo. At the moment I can't reproduce your error. Nothing has been changed, I get the same error with the euleroForward function than before (see below). In the evening I will try to make further tests. Then we will see.

Did you updated everything to the latest versions (YARP, YCM, codyco-superbuild, mex-WBM, etc.)? Try this and recompile everything new. But before you doing this, call everywhere at first the command make clean and then make update-all.

mexWholeBodyModel started with robot: icubGazeboSim, Num of Joints: 25
> In runController (line 33)
  In forwardDynamics (line 64)
  In initForwardDynamics>@(t,chi)forwardDynamics(t,chi,CONFIG) (line 70)
  In euleroForward (line 26)
  In initForwardDynamics (line 74)
  In initializeTorqueBalancingErr (line 143) 
Warning: Matrix is singular to working precision. 
> In stackOfTaskController (line 48)
  In runController (line 46)
  In forwardDynamics (line 64)
  In initForwardDynamics>@(t,chi)forwardDynamics(t,chi,CONFIG) (line 70)
  In euleroForward (line 26)
  In initForwardDynamics (line 74)
  In initializeTorqueBalancingErr (line 143) 
Warning: Matrix is singular to working precision. 
> In stackOfTaskController (line 52)
  In runController (line 46)
  In forwardDynamics (line 64)
  In initForwardDynamics>@(t,chi)forwardDynamics(t,chi,CONFIG) (line 70)
  In euleroForward (line 26)
  In initForwardDynamics (line 74)
  In initializeTorqueBalancingErr (line 143) 
Warning: Matrix is singular to working precision. 
> In stackOfTaskController (line 115)
  In runController (line 46)
  In forwardDynamics (line 64)
  In initForwardDynamics>@(t,chi)forwardDynamics(t,chi,CONFIG) (line 70)
  In euleroForward (line 26)
  In initForwardDynamics (line 74)
  In initializeTorqueBalancingErr (line 143) 
Warning: Matrix is singular to working precision. 
Error using svd
Input to SVD must not contain NaN or Inf.

Error in pinv (line 18)
[U,S,V] = svd(A,'econ');

Error in stackOfTaskController (line 118)
pinvLambda         = pinv(Lambda,pinv_tol);

Error in runController (line 46)
controlParam           = stackOfTaskController(CONFIG,gain,trajectory,DYNAMICS,FORKINEMATICS,STATE);

Error in forwardDynamics (line 64)
controlParam    = runController(gain,trajectory,DYNAMICS,FORKINEMATICS,CONFIG,STATE);

Error in initForwardDynamics>@(t,chi)forwardDynamics(t,chi,CONFIG) (line 70)
forwardDynFunc        = @(t,chi)forwardDynamics(t,chi,CONFIG);

Error in euleroForward (line 26)
    chi(:,k) = chi(:,k-1) + tstep.*func(t(k-1),chi(:,k-1));

Error in initForwardDynamics (line 74)
    [t,chi]           = euleroForward(forwardDynFunc,chiInit,CONFIG.tEnd,CONFIG.tStart,CONFIG.sim_step);

Error in initializeTorqueBalancingErr (line 143)
initForwardDynamics(CONFIG);

>>

VModugno commented 7 years ago

as i wrote before after experiencing the error (the reason why you have the error is not just because im using the euler integration but because the desired com trajectory i provide is designed to induce this error) you should run the code a second time and check the value that come out from wbm_getWorldFrameFromFixLnk

Ganimed commented 7 years ago

Hi @VModugno,

sorry for the late response, but during the day I'm working on the building site. Maybe I didn't expressed it clearly. I was running the code several times one after the other and every time it was ending up into the above error. I did not get any problems with wbm_getWorldFrameFromFixLnk.

At the moment I'm using in the background of the C++ part the yarpWholeBodyModelV1.h instead of the new yarpWholeBodyModelV2.h. When I have time I will recompile the complete C++ code with the new version V2. If then there raises up your error, then the yarpWholeBodyModelV2 in C++ has somewhere a hidden bug.

VModugno commented 7 years ago

Btw i have found where is the problem. The problem is originated by the fact that in the class modelstate the wf_H_b stores the last value from the previous execution. So if in the last execution you have a numerical error the last values stored in wf_H_b will be a bunch of nan. So when you retrive the value the first time that get-state is called you will end up with the nans from the previous run. So to fix it i was thinking to create a new ModelComponent to provide a new function to reinitiliaze the value in wf_H_b.

VModugno commented 7 years ago

Now the code is working. i wrapped the new function in matlab and i have performed few tests. I believe that there are cleaner ways to fix it, so if you have some suggestions i will be glad to do the implementation

gabrielenava commented 7 years ago

Ok, happy to see the problem is solved! I think @Ganimed worked a lot on the .cpp library so let's wait for his opinion before closing this issue. Concerning the numerical integration, if it is not fundamental for your results I suggest you to switch back to ODE15s anyway. Otherwise we can discuss this topic in a new issue: sometimes the long time required by ODE for integration is an issue also for me, and I was trying to find a solution which does not involve fixed step integration.

Ganimed commented 7 years ago

Hi all,

I'm now back from the building site. At the moment I'm every day from 7 a.m. until 6-7 p.m. on the building site. So I can read only the answers very delayed. But next week the house will be completely removed and I'm free.

@VModugno I was checking your hint in my current code. You are right. For the reinitialization I have forgotten to reset in the class ModelState the variable _wf_Hb. To create a new ModelComponent is a solution, but not a clean solution because without freeing the previous allocated memory it would compromise the memory manager of Matlab. There is a more elegant way to reset the _wf_Hb. I will bugfix this.

Ganimed commented 7 years ago

Bugfixed the issue. I will upload soon the correction in C++ with some further Matlab code for the WBM-Class.

VModugno commented 7 years ago

@Ganimed If you tell me how you want to fix this i can help to update the code. As i told you before for now i have just created a new function in matlab that basically reset the value wf_H_b to the indentity. So if you call this new function at the beggining of the code the problem is solved. If you have a better approach let me know. @gabrielenava you are right the simulation can be cumbersome but in my opinion i believe that there are not many strategies at our disposal to speed up the simulation process. If you have some idea i will be glad to discuss this topic with you in another issue

Ganimed commented 7 years ago

@VModugno at the moment I'm working every day the whole day at the building site. We had teared down the house of my grandmother. This was for our family very emotional. Currently I can only read the messages very delayed.

I have solved the problem directly in the C++ code of the mex-subroutine. The correction was only two lines. When I have uploaded the corrected version (hope this weekend), you don't need anymore to call your wbm_resetWorldFrame method.

gabrielenava commented 7 years ago

HI @Ganimed, any news on this fix?

Ganimed commented 7 years ago

I will upload the fix today. The last 4 days I was coding on other places. The work on the building site had cost a lot of time.

Ganimed commented 7 years ago

Update: The current part of the Matlab-code where I'm working on is still not finished. As soon as this part is working well, I will upload the complete bunch of changes including the bugfix.

Ganimed commented 7 years ago

Update: I have now uploaded the bugfix with some new Matlab-functions to the branch WBM-Class.

@VModugno: Try to use the bugfixed version without your wbm_resetWorldFrame method. If it works without errors, then we can close this issue.

VModugno commented 7 years ago

Im sorry for the late answer, i will try it asap

Ganimed commented 6 years ago

The bug is already removed from the C++ code since last year - closing this issue.

robotology-legacy / mex-wholebodymodel

wbm_getState return nan in the vector-quaternion transformation #87