Closed OxDuke closed 2 years ago
Dear @OxDuke,
Thanks a lot for your feedback. It is pretty helpful for us in order to improve hpp-fcl.
From a first perspective, your benchmark is really simple. Maybe a bit too much to really mean something. If you looked at the one done in the R:SS paper, their are runned over thousands of random configurations. In addition, we are comparing not only the timings, but surely the number of iterations between GJK and its accelerated versions. What is sure from our side, it is that there is a lot of memory allocations within hpp-fcl, that we inherrit in fact from the original project and we are aiming to remove.
From a benchmark perspective, one should decouple the algorithm and their implementations. It would be nice if you can provide the benchmarks against libccd and OpenGJK. Certainly, their implementations is much more tailored than ours.
To move forward, could you share your entire code?
Thanks in advance, Justin
Hello @OxDuke,
Thank you for taking the time to make these comparisons! As @jcarpent wrote, it would be great if you could share your entire code so that we can make fair comparisons with the results presented in the paper. A few things could explain why Nesterov-accelerated GJK is worse than GJK in the case of the code snippet above:
cvx1_transform.translation() << 0, 0, 1.5;
may mean that the distance separating the shapes is large. In the paper, we explain that the Nesterov acceleration really benefits GJK when the shapes are in close-proximity. When the shapes are far from one another, GJK would not likely be called, as it is too expensive. The goal of the broadphase is to make sure we don't check intersection/don't compute the distance of distant objects. As explained in the above table's caption, the results in the table are obtained for when the objects are close to one another.gjk.evaluate(...)
. The memory allocation of the Minkowski difference as well as the initial support/initial GJK guess were removed. This memory allocation probably increases the running time compared to those found in the table.(1, 0 ,0)
which is a bad initial guess for both GJK and Nesterov-accelerated GJK. In the paper, we explain that we use init_guess = c1 - c2
where c1
and c2
are the centers of the objects AABBs.Regarding the performance of HPPFCL compared to libccd, could you share with us which libccd function you used to compare to HPPFCL? For example, libccd only does the boolean collision check when ccdGJKIntersect
is called. If I remember correctly, the user needs to specify to ccd if they want to compute the distance between the shapes.
In the snippet above, GJK is used to compute the distance between the two shapes. It thus goes beyond the boolean collision check which increases the execution time. To use HPPFCL's GJK in boolean checking mode, you need to add gjk.setDistanceEarlyBreak = 0
.
Best, Louis
Another remark regarding the convergence criterion of GJK.
Currently the default of HPPFCL is GJKConvergenceCriterion::VDB
. We kept this default to not break HPPFCL's users code.
On the other hand, to compute distances between shapes, OpenGJK is using the Frank-Wolfe duality gap as a convergence criterion which is different than the FW duality-gap; this criterion is mentioned in the paper and used for the experiments.
This corresponds to setting:
gjk.convergence_criterion = GJKConvergenceCriterion::DualityGap
and gjk.convergence_criterion_type = GJKConvergenceCriterionType::Absolute
.
The convergence criterions impact the running time as they give different guarantees of optimality. The advantage of the absolute duality-gap criterion is that if a tolerance of eps
is required from GJK, then the 2-norm distance to the optimal solution is guaranteed to be less than sqrt(eps)
when GJK has converged.
Hi @OxDuke,
I took the script you gave us above and created a quick benchmark to compare Nesterov-accelerated GJK and GJK on two shapes, similar to what is done in the RSS paper. The task in this benchmark is boolean collision checking, similar to what you would find in libccd. You can find the code below.
I used meshes from the example-robot-data repository, more specifically meshes from the Panda robot. There are two kinds of meshes in this repo: collision
and visual
meshes. Visual meshes contain more vertices, so you can change the code below to increase the number of vertices and see how the algos behave.
To benchmark both algos, the key is to:
I tested this code on a machine with an Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
, 64GB of RAM. I am running Manjaro, the code is compiled in Release
mode, using clang-13
. I get the following results:
Num vertices shape 1: 102
Num vertices shape 2: 152
---- Distance: -0.1 meters ----
Mean running time (in us)
GJK: 0.832189
Nesterov: 0.985011
---- Distance: -0.01 meters ----
Mean running time (in us)
GJK: 0.911282
Nesterov: 0.909
---- Distance: -0.001 meters ----
Mean running time (in us)
GJK: 1.1407
Nesterov: 0.828814
---- Distance: -0.0001 meters ----
Mean running time (in us)
GJK: 1.17485
Nesterov: 0.833695
---- Distance: 0.0001 meters ----
Mean running time (in us)
GJK: 1.07502
Nesterov: 0.601499
---- Distance: 0.001 meters ----
Mean running time (in us)
GJK: 0.852024
Nesterov: 0.474011
---- Distance: 0.01 meters ----
Mean running time (in us)
GJK: 0.408068
Nesterov: 0.330122
---- Distance: 0.1 meters ----
Mean running time (in us)
GJK: 0.290783
Nesterov: 0.294872
---- Distance: 1 meters ----
Mean running time (in us)
GJK: 0.288546
Nesterov: 0.295541
In conclusion, I obtain the same results as in the paper: when the shapes are in close-proximity or shallowly overlapping, the Nesterov acceleration of GJK significantly improves over GJK. When the shapes are distant (1 meter) or have a large overlap (10 cm overlap), GJK is competitive or better than the Nesterov acceleration. Our goal with HPPFCL is to exploit the Nesterov momentum to accelerate cases in which the Nesterov acceleration works the best. These cases are important in physics simulation, especially when dealing with contact scenarios that would for example occur in manipulation tasks.
Regarding the absolute execution time, I find that, for boolean collision checks, HPPFCL is very much below 2.5 microsec for when the shapes are distant of 1 meter (I suppose this distance corresponds to the translation in the code snippet you provided aboce). In fact, I find that the mean running time is actually 0.3 microsec
. This is of course impacted by the number of vertices in the meshes. However, even with the visual
meshes, I find a running time of 0.5 microsec
for a distance of 1 meter.
Here is the code for the quick benchmark:
#include <hpp/fcl/narrowphase/gjk.h>
#include <hpp/fcl/shape/convex.h>
#include <hpp/fcl/shape/geometric_shapes.h>
#include "hpp/fcl/mesh_loader/loader.h"
#include "hpp/fcl/BV/AABB.h"
#include "hpp/fcl/BVH/BVH_model.h"
#include "utility.h"
using hpp::fcl::support_func_guess_t;
using hpp::fcl::Transform3f;
using hpp::fcl::Vec3f;
using hpp::fcl::FCL_REAL;
using hpp::fcl::NODE_TYPE;
using hpp::fcl::BV_AABB;
using hpp::fcl::MeshLoader;
using hpp::fcl::BVHModelPtr_t;
using hpp::fcl::ConvexBase;
using hpp::fcl::GJKVariant;
using hpp::fcl::GJKConvergenceCriterion;
using hpp::fcl::GJKConvergenceCriterionType;
using hpp::fcl::Timer;
using hpp::fcl::details::GJK;
using hpp::fcl::details::MinkowskiDiff;
std::shared_ptr<ConvexBase>
loadConvexShape(const std::string& file_name)
{
NODE_TYPE bv_type = BV_AABB;
MeshLoader loader(bv_type);
BVHModelPtr_t bvh1 = loader.load(file_name);
bool _ = bvh1->buildConvexHull(true, "Qt");
return bvh1->convex;
}
std::vector<Transform3f> generateRandomTransforms(int n_transfos){
FCL_REAL extents[] = {-1., -1., -1., 1., 1., 1.};
std::vector<Transform3f> transforms;
generateRandomTransforms(extents, transforms, static_cast<size_t>(n_transfos));
// The translations are set such that the shapes are not in contact.
// This allows us to control the distance between the shapes afterwards.
for(size_t i = 0; i < transforms.size(); i++){
Vec3f new_translation = transforms[i].getTranslation();
new_translation.noalias() = 3 * new_translation / new_translation.norm();
transforms[i].setTranslation(new_translation);
}
return transforms;
}
FCL_REAL evalSolver(GJK& solver, const ConvexBase* shape1, const ConvexBase* shape2,
std::vector<Transform3f>& transforms, FCL_REAL distance){
// -- Pose shape 1
Transform3f tf1 = Transform3f::Identity();
// -- Minkowski difference and init guess/support
MinkowskiDiff mink_diff;
Vec3f init_guess;
support_func_guess_t init_support_guess;
Timer timer;
// Solver used to separate the shapes at distance
GJK separator(128, 1e-8);
separator.convergence_criterion = GJKConvergenceCriterion::DualityGap;
separator.convergence_criterion_type = GJKConvergenceCriterionType::Absolute;
// -- Loop through all the poses
FCL_REAL mean_time = 0.;
FCL_REAL n_problem_solved = 1.;
for (size_t i = 0; i < transforms.size(); i++) {
Transform3f T = transforms[i];
mink_diff.set(shape1, shape2, tf1, T);
init_guess.noalias() = shape1->aabb_local.center() -
(mink_diff.oR1 * shape2->aabb_local.center() + mink_diff.ot1);
init_support_guess.setZero();
GJK::Status status = separator.evaluate(mink_diff, init_guess, init_support_guess);
// This allows us to set the distance that separates the shapes
if (status == GJK::Valid && separator.ray.norm() > 0) {
Vec3f normal = separator.ray / separator.ray.norm();
Vec3f dt = separator.ray - distance * normal;
Vec3f new_translation = T.getTranslation() + dt;
T.setTranslation(new_translation);
// We record the execution time
FCL_REAL mean_time_problem = 0.;
for (size_t j = 0; j < 100; j++){
timer.stop();
timer.start();
mink_diff.set(shape1, shape2, tf1, T);
init_guess.noalias() = shape1->aabb_local.center() -
(mink_diff.oR1 * shape2->aabb_local.center() + mink_diff.ot1);
init_support_guess.setZero();
status = solver.evaluate(mink_diff, init_guess, init_support_guess);
timer.stop();
mean_time_problem += timer.elapsed().user;
}
mean_time_problem /= 100;
mean_time += mean_time_problem;
n_problem_solved += 1;
}
}
mean_time /= n_problem_solved;
return mean_time;
}
int main()
{
const std::string& file_name_1 = "/home/louis/dev/control/example-robot-data/robots/panda_description/meshes/collision/link0.stl";
// const std::string& file_name_1 = "/home/louis/dev/control/example-robot-data/robots/panda_description/meshes/visual/link0.dae"; // these visual meshes have more vertices
std::shared_ptr<ConvexBase> shape1 = loadConvexShape(file_name_1);
std::cout << "Num vertices shape 1: " << shape1->num_points << std::endl;
shape1->computeLocalAABB();
const std::string& file_name_2 = "/home/louis/dev/control/example-robot-data/robots/panda_description/meshes/collision/link1.stl";
// const std::string& file_name_2 = "/home/louis/dev/control/example-robot-data/robots/panda_description/meshes/visual/link1.dae"; // these visual meshes have more vertices
std::shared_ptr<ConvexBase> shape2 = loadConvexShape(file_name_2);
std::cout << "Num vertices shape 2: " << shape2->num_points << std::endl;
shape2->computeLocalAABB();
// -- Generate random poses
int n_transfos = 500;
std::vector<Transform3f> transforms = generateRandomTransforms(n_transfos);
// -- Solvers
const unsigned int max_iterations = 128;
const hpp::fcl::FCL_REAL tolerance = 1e-8;
GJK gjk(max_iterations, tolerance);
gjk.setDistanceEarlyBreak(0.); // for boolean collision checking
gjk.gjk_variant = GJKVariant::DefaultGJK;
gjk.convergence_criterion = GJKConvergenceCriterion::DualityGap;
gjk.convergence_criterion_type = GJKConvergenceCriterionType::Absolute;
GJK nesterov(max_iterations, tolerance);
nesterov.gjk_variant = GJKVariant::NesterovAcceleration;
nesterov.setDistanceEarlyBreak(0.); // for boolean collision checking
nesterov.convergence_criterion = GJKConvergenceCriterion::DualityGap;
nesterov.convergence_criterion_type = GJKConvergenceCriterionType::Absolute;
// -- Distances between shapes
std::vector<FCL_REAL> distances = {-0.1, -0.01, -0.001, -0.0001,
0.0001, 0.001, 0.01, 0.1, 1.};
// -- Loop through all the distances
for (size_t d = 0; d < distances.size(); d++) {
FCL_REAL mean_time_gjk = evalSolver(gjk, shape1.get(), shape2.get(),
transforms, distances[d]);
FCL_REAL mean_time_nesterov = evalSolver(nesterov, shape1.get(), shape2.get(),
transforms, distances[d]);
std::cout << "---- Distance: " << distances[d] << " meters ----" << std::endl;
std::cout << "Mean running time (in us)" << std::endl;
std::cout << " GJK: " << mean_time_gjk << std::endl;
std::cout << " Nesterov: " << mean_time_nesterov << "\n" << std::endl;
}
}
Best, Louis
Here are the results I get for the visual
meshes (needs to be uncommented in the code above):
Num vertices shape 1: 499
Num vertices shape 2: 1590
---- Distance: -0.1 meters ----
Mean running time (in us)
GJK: 2.09625
Nesterov: 2.53351
---- Distance: -0.01 meters ----
Mean running time (in us)
GJK: 2.192
Nesterov: 2.0687
---- Distance: -0.001 meters ----
Mean running time (in us)
GJK: 2.64804
Nesterov: 1.73103
---- Distance: -0.0001 meters ----
Mean running time (in us)
GJK: 3.07247
Nesterov: 1.71276
---- Distance: 0.0001 meters ----
Mean running time (in us)
GJK: 2.69307
Nesterov: 1.29568
---- Distance: 0.001 meters ----
Mean running time (in us)
GJK: 1.92687
Nesterov: 0.841321
---- Distance: 0.01 meters ----
Mean running time (in us)
GJK: 0.820534
Nesterov: 0.588629
---- Distance: 0.1 meters ----
Mean running time (in us)
GJK: 0.51323
Nesterov: 0.527296
---- Distance: 1 meters ----
Mean running time (in us)
GJK: 0.513308
Nesterov: 0.531278
Thank you very much for your timely response. I am still cleaning up my code so it might take a while to upload. Since Louis has showcased his benchmarks, I am now trying to reproduce his results.
Hi guys, I have redone my benchmarks. My latest results do match with what was claimed in the RSS 2022 paper. Thank you for your support!
I compared four solvers:
For simplicity, benchmarks were done on two convex shapes, which have 532 and 341 vertices, respectively. Since I also want to measure the complexity against number of vertices, a series of mesh simplifications were applied on each one of the two shapes. For shape1, it was simplified to 35, 68, 134, 267, 532(unsimplified) vertices. For shape2, it was simplified to 22, 44, 86, 171, 341(unsimplified) vertices. Each pair of the above mentioned convex shapes were used for benchmarks.
Shape1 with five levels of simplification:
Shape2 with five levels of simplification:
There are 6 sets of benchmarks, each with a different separating distance between two shapes.
Hardware I used was a PC with Intel® Core™ i7-8700K CPU @ 3.70GHz × 12, and 32GB of memory. The system is Ubuntu20.04.3 LTS
The results are shown in the figure below:
There are four big columns in the figure above. Each big column shows the result from one of the four solvers. The name of the solver is printed on the first row of each big column. In each big column, there are 6 tables showing timing results (in microseconds) for each pair of distance test. The header of each table is of format:
ID=xxx, Distance=xxx, Time in us
The ID does not matter, it's for my own use only. Distance stands for the separation distance between two shapes.
I am willing to do more explanation if I have not explained my method & results clear enough.
@OxDuke Thanks for the nice report. We are very happy that you've been able to reproduce our results;
Could you share your code in such a way we can make enhance and push forward for better performances in the near future?
@OxDuke Thanks for the nice report. We are very happy that you've been able to reproduce our results;
Could you share your code in such a way we can make enhance and push forward for better performances in the near future?
No problem, please give me a few days to clean up my code.
@OxDuke Your raw code should be already enough I would say. It seems unnecessary you waste your time. If you have a shared repository, we can have a look and then integrate it into our benchmark pipeline.
Is it fine for you?
I am sorry that I am not able to upload my raw code. Part of my code is running on my company's codebase, so for commerical reasons, I have to rewrite that part in my spare time. Thanks for your understanding.
OK. No problem. We totally understand. Best, Justin
@OxDuke Any update on the code sharing?
Background
I come from the RSS 2022 paper, and I think your work is interesting. However, my preliminary benchmark results showed that hpp-fcl is not as fast as OpenGJK and libccd. I was computing the separating distance between two convex polytopes, one has over 500 vertices, the other has around 350 vertices.
OpenGJK took around 1us, using linear time support function (so it can potentially be much faster if we were to use log time support function), libccd (which uses MPR) took around 1.5us. hpp-fcl's default GJK took 2.5us, GJK-nesterov took 3us.
I used Google's benchmark library to record time mentioned above.
My result contradicts to what was claimed in #313 and the performance benchmark in the paper:
Possible Reasons
Hardware issues.
The RSS paper did not give any details on the hardware used, I wish they can be disclosed here.
What I am using is a PC with Intel® Core™ i7-8700K CPU @ 3.70GHz × 12, and 32GB of memory. The system is Ubuntu20.04.3 LTS
My Implementation is wrong
I paste a simplified version of my code here for reference. My implementation largely follows the test files. If needed, I can provide all my source files.
I would be most grateful if someone is able to help me achieve the performance claimed in the paper.