hungpham2511 / toppra

robotic motion planning library
https://hungpham2511.github.io/toppra/index.html
MIT License
612 stars 169 forks source link

Fixed bug in ConstAccel parametrizer where evaluating valid times can sometimes result in an exception being thrown. #185

Closed stevegolton closed 2 years ago

stevegolton commented 2 years ago

Fixes #184

Changes in this PR:

Checklists:

Description of the PR

Consider the following snippet:

auto p = toppra::parametrizer::ConstAccel(path, gridpoints, vsquared);
auto bound = p.pathInterval();
p.eval_single(bound[1]));

The call to eval_single(bound[1]) can sometimes throw an error for certain paths, even though the time bound[1] should be a valid evaluation time. This is due to a bug inside the ConstAccel parametrizer. This PR fixes this bug.

The ConstAccel object contains a pointer to a GeometricPath object called m_path which is injected in at construction. When ConstAccel::eval_single() or ConstAccel::eval() is called, the ConstAccel object translates the time passed in to a value of s which is subsequently used in a call to the underlying m_path->eval().

As long as the time passed into ConstAccel::eval() in the first place is within the bounds defined by ConstAccel::pathInterval(), the resulting call to m_path->eval() should be within the bounds defined by m_path->pathInterval(), but due to floating point numerical errors and the way the translation works, the isn't always the case. Sometimes the resulting value of s passed to m_path->eval() can be outside the bounds defined by m_path->pathInterval(), and if m_path is of type PiecewisePolyPath, a std::runtime_error is thrown:

https://github.com/hungpham2511/toppra/blob/a70f6a217ddf20c42b62471ab52ce16884a320c1/cpp/src/toppra/geometric_path/piecewise_poly_path.cpp#L148-L153

In practice, this means it's dangerous to sample arbitrary trajectories using e.g. Eigen's LinSpaced() function, as sometimes this will fail when the underlying m_path object throws an error, especially with times which are very close to the bounds.

auto bound = p.pathInterval();
toppra::Vector times = toppra::Vector::LinSpaced(N, bound[0], bound[1]);
p.eval(times); // <-- can sometimes fail

Instead, the caller must do something nasty like this:

auto bound = p.pathInterval();
toppra::Vector times = toppra::Vector::LinSpaced(N, bound[0], bound[1]);
const double EPS = 1e-5;
times[0] += EPS;
times[N-1] -= EPS;
p.eval(times); // <-- will hopefully never fail..??!

Note: We assume the following assertions hold true, which appears to be the case for the LinSpaced() function.

toppra::Vector times = toppra::Vector::LinSpaced(N, A, B);
assert(times[0] == A);
assert(times[N-1] == B);

The most important thing this PR adds is a clamp on any value before it is passed to the underlying m_path->eval() in order to move it within the bounds defined by m_path->pathInterval(). However, this means most values passed to ConstAccel::eval() which are outside the bounds defined by ConstAccel::pathInterval() get silently clamped, which is not friendly for debugging. Thus, the second thing this PR adds is a specific check that all times passed to ConstAccel::eval() are within the range ConstAccel::pathInterval() otherwise a std::runtime_error is thrown, which is exactly what the PeicewisePolyPath class does.

This PR does not address any issue which might arise from the caller doing something like the following, either inadvertently or not.

auto p = toppra::parametrizer::ConstAccel(path, gridpoints, vsquared);
auto bound = p.pathInterval();
p.eval_single(bound[1] + SOME_TINY_VALUE));

This is because the caller can check for this case themselves, whereas in the bug discussed above it is not possible to do so from the outside. Also, this is an obvious violation of the ConstAccel object's interface.

hungpham2511 commented 2 years ago

Thank you for the pull request!

I will try to review and merge as soon as I can. Thanks again for the contribution.

jmirabel commented 2 years ago

May I know what you try to do ?

If 25363210 is in second, it is equal to 0.8 years, which is non-sense for a robot trajectory.

stevegolton commented 2 years ago

May I know what you try to do ?

If 25363210 is in second, it is equal to 0.8 years, which is non-sense for a robot trajectory.

I've encountered this bug on sensical trajectories - e.g. trajectories of a few seconds in duration - but I didn't manage to capture the conditions. The test here is just a canned example which happened to reproduce the issue.

Edit:

I just updated the test to use a more realistic path length:

double PATH_LEN = 3.51363644474459757560680;
hungpham2511 commented 2 years ago

May I know what you try to do ? If 25363210 is in second, it is equal to 0.8 years, which is non-sense for a robot trajectory.

I've encountered this bug on sensical trajectories - e.g. trajectories of a few seconds in duration - but I didn't manage to capture the conditions. The test here is just a canned example which happened to reproduce the issue.

Sounds interesting. Please let us know if you spot another instance of this error again. This can happen if you have a near-zero path velocity, which again can happen if you i) use the basic interpolation scheme and ii) have a tricky path.

stevegolton commented 2 years ago

May I know what you try to do ? If 25363210 is in second, it is equal to 0.8 years, which is non-sense for a robot trajectory.

I've encountered this bug on sensical trajectories - e.g. trajectories of a few seconds in duration - but I didn't manage to capture the conditions. The test here is just a canned example which happened to reproduce the issue.

Sounds interesting. Please let us know if you spot another instance of this error again. This can happen if you have a near-zero path velocity, which again can happen if you i) use the basic interpolation scheme and ii) have a tricky path.

It's quite easy to reproduce by trying random path lengths. Consider the following test, which tries up to 100 random path lengths on the ConstAccel parametrizer. On my machine, I usually end up with this bug cropping up within a handful of attempts. Note: I didn't include this test in the pull request, as randomness doesn't make for a good unit test.

The bug appears to be an artifact of trying to arrive at the same number via different methods which ends up with slightly different values due to floating point numerical errors. This is fine in the middle of the path but breaks down at the two ends of the path.

TEST(ParametrizeConstAccelNoFixture, BoundsViolationRandom) {
  std::random_device rd;
  std::mt19937 e2(rd());
  std::uniform_real_distribution<> dist(0, 10.0);

  for (int i = 0; i < 1e3; ++i)
  {
    double PATH_LEN = dist(e2);
    std::cout << std::setprecision(32) << PATH_LEN << '\n';

    toppra::Vectors positions = {
      toppra::Vector::Zero(1),
      toppra::Vector::Zero(1)
    };
    toppra::Vectors velocities = {
      toppra::Vector::Zero(1),
      toppra::Vector::Zero(1)
    };

    toppra::Vector times = toppra::Vector::LinSpaced(2, 0, PATH_LEN);
    auto std_times = std::vector<double>(times.data(), times.data() + times.size());

    auto path = toppra::PiecewisePolyPath::constructHermite(positions, velocities, std_times);
    auto ppath = std::make_shared<toppra::PiecewisePolyPath>(path);

    toppra::Vector gridpoints = toppra::Vector::LinSpaced(10, 0, PATH_LEN);
    toppra::Vector vsquared{10};
    vsquared << 0, 0.1, 0.2, 0.3, 0.5, 0.5, 0.3, 0.2, 0.1, 0.0;
    auto p = toppra::parametrizer::ConstAccel(ppath, gridpoints, vsquared);

    auto bound = p.pathInterval();
    auto qs = p.eval_single(bound[1]);
  }
}
hungpham2511 commented 2 years ago

May I know what you try to do ? If 25363210 is in second, it is equal to 0.8 years, which is non-sense for a robot trajectory.

I've encountered this bug on sensical trajectories - e.g. trajectories of a few seconds in duration - but I didn't manage to capture the conditions. The test here is just a canned example which happened to reproduce the issue.

Sounds interesting. Please let us know if you spot another instance of this error again. This can happen if you have a near-zero path velocity, which again can happen if you i) use the basic interpolation scheme and ii) have a tricky path.

It's quite easy to reproduce by trying random path lengths. Consider the following test, which tries up to 100 random path lengths on the ConstAccel parametrizer. On my machine, I usually end up with this bug cropping up within a handful of attempts. Note: I didn't include this test in the pull request, as randomness doesn't make for a good unit test.

The bug appears to be an artifact of trying to arrive at the same number via different methods which ends up with slightly different values due to floating point numerical errors. This is fine in the middle of the path but breaks down at the two ends of the path.

TEST(ParametrizeConstAccelNoFixture, BoundsViolationRandom) {
  std::random_device rd;
  std::mt19937 e2(rd());
  std::uniform_real_distribution<> dist(0, 10.0);

  for (int i = 0; i < 1e3; ++i)
  {
    double PATH_LEN = dist(e2);
    std::cout << std::setprecision(32) << PATH_LEN << '\n';

    toppra::Vectors positions = {
      toppra::Vector::Zero(1),
      toppra::Vector::Zero(1)
    };
    toppra::Vectors velocities = {
      toppra::Vector::Zero(1),
      toppra::Vector::Zero(1)
    };

    toppra::Vector times = toppra::Vector::LinSpaced(2, 0, PATH_LEN);
    auto std_times = std::vector<double>(times.data(), times.data() + times.size());

    auto path = toppra::PiecewisePolyPath::constructHermite(positions, velocities, std_times);
    auto ppath = std::make_shared<toppra::PiecewisePolyPath>(path);

    toppra::Vector gridpoints = toppra::Vector::LinSpaced(10, 0, PATH_LEN);
    toppra::Vector vsquared{10};
    vsquared << 0, 0.1, 0.2, 0.3, 0.5, 0.5, 0.3, 0.2, 0.1, 0.0;
    auto p = toppra::parametrizer::ConstAccel(ppath, gridpoints, vsquared);

    auto bound = p.pathInterval();
    auto qs = p.eval_single(bound[1]);
  }
}

This certainly sounds like a bug :sweat_smile: