Solving with cuOSQP - Githubissues

gacamilo commented 2 years ago

I am trying to implement a controller that needs to run at 100Hz but after a lot of tuning I can only get solutions in ~55ms. There is this version of OSQP that is CUDA accelerated (cuOSQP) and apparently it can speed up execution by up to two orders of magnitude for some problems.

I thought this looked pretty promising so I tried to quickly glue it into your library by replacing the OSQPData struct with a custom struct that is compatible with what libmpc and cuOSQP expect. cuOSQP expects the data struct in the picture below which is internally generated from the csc* and c_float* that are in use on the normal OSQP.

So I just added the struct below to LOptimizer.hpp and replaced data with this new type.

I think that should make all of libmpc keep its current behaviour and the code actually compiles and runs but the solver always returns NaN for some reason and the objective function is 0 on the first evaluation.

I don't think this is an issue for libmpc but I thought I'd post here and ask if you have any ideas as to what might be causing it. It would certainly be awesome if this library could be even better and faster by using cuOSQP.

nicolapiccinelli commented 2 years ago

Hi @camilogonzalez97 try to run the linear example in the test folder and check if it behaves the same. Regarding the vanilla libmpc, are you using static or dynamic allocation? Have you enabled openmp?

gacamilo commented 2 years ago

Hi @nicolapiccinelli, thank you for getting back to me. Yeah, I tried it with the example in the test folder and it also happened. However, I just discovered that this is all due to a casting issue. In the standard version of OSQP c_float is defined as double but the CUDA solver in cuOSQP requires disabling doubles and c_float evaluates to float. When I started modifying libmpc I just changed these lines in LOptimizer.run():

data->q = mpcProblem.q.data();
data->l = mpcProblem.l.data();
data->u = mpcProblem.u.data();

For these lines:

data->q = (float*) mpcProblem.q.data();
data->l = (float*) mpcProblem.l.data();
data->u = (float*) mpcProblem.u.data();

I didn't give it a second thought at the time, but today I discovered that some numbers cast to NaN so the l and u vectors always end up with NaN entries. I'm not sure why it doesn't happen with q though.

I've changed the whole library to work with floats and the example in the test folder works. Now I need to change my application to work with floats and I'm optimistic that will do. Not the best solution but I haven't found any other way of casting safely. I'll let you know if cuOSQP does result in faster solutions. If it does I could make a branch designed to work with it.

nicolapiccinelli commented 2 years ago

If cuOSQP will be a valuable solution for real-time control we can work on including it in the library for sure. Which kind of problem are you working on?

gacamilo commented 2 years ago

I'm working on a real-time implementation of an MPC-based Motion Cueing Algorithm.

Unfortunately, cuOSQP doesn't seem so promising. It returns completely different solutions from OSQP for the same problem and the solution takes many more iterations. I haven't given up on it yet but I'm not so sure that it will be a suitable replacement anymore.

gacamilo commented 2 years ago

I reached the end of the road with cuOSQP, it turns out there is a hard bottleneck in setting up the problem in the GPU. In my case this takes ~40ms no matter what I try, so the 100 Hz goal was always a dream.

I ended up splitting my problem and going back to OSQP. I'm working with MSVC and in Release mode optimised for speed I managed to find a subset of problems that solve in ~15 ms. While trying to reduce this I found something interesting in libmpc.

In LOptimizer.hpp this line takes ~6 ms to run but when I time the get method inside ProblemBuilder.hpp it takes <0.1 ms. Looks like despite returning a reference a copy is being made so I modified the ProblemBuilder class and made mpcProblem public so that it can be accessed via builder->mpcProblem in LOptimizer. I also made get a void method. This brought down the solution time from ~15 ms to ~9 ms. Seems like a worthy improvement for the library, what do you think?

There is one more thing that I'm looking into atm which might shave another ms or two from the solution time. Instead of calling osqp_setup on every iteration, one could use the update functions in the osqp c api to modify the l, q and u vectors. This is what they do in embedded mode and it also works here because P and A need only be set up once. I'll see how this goes.

nicolapiccinelli commented 2 years ago

@camilogonzalez97 I've investigated what you reported and I've found a solution without lettingmpcProblem public, the overhead of the get function is almost zero. I'm going to push this fix on master soon, thanks to pointed out it. Regarding the solution of calling osqp_setup could work only for standard mpc problem. In case of adaptive of time-varying mpc, the matrices P and A varies over time, anyway the overhead of recreating the osqp problem seems not that big.

nicolapiccinelli / libmpc

Solving with cuOSQP #1