orocos / orocos_kinematics_dynamics

Orocos Kinematics and Dynamics C++ library
694 stars 407 forks source link

AddressSanitizer throws errors #415

Closed rhaschke closed 2 years ago

rhaschke commented 2 years ago

I'm getting asan errors when assigning KDL::JntArray::data. I tracked the issue down to the following simple code snippet, which looks perfectly ok in my opinion:

#include <kdl/jntarray.hpp>
#include <iostream>

void f(const std::array<double, 7> &q)
{
#if 1  // triggers an asan error
    KDL::JntArray kq;  // when releasing this JntArray's data
    kq.data = Eigen::Matrix<double, 7, 1>::Map(q.data());  // which was allocated here
    std::cout << kq.data << std::endl;
#else // the error vanishes if Eigen::MatrixXd is used directly
    Eigen::MatrixXd data;
    data = Eigen::Matrix<double, 7, 1>(q.data());
    std::cout << data << std::endl;
#endif
}

int main(int argc, char const *argv[])
{
    std::array<double, 7> q {1,2,3,4,5,6,7};
    f(q);
    return 0;
}

Compiling this with CXXFLAGS="-g -fsanitize=address -fno-omit-frame-pointer -O0" yields:

AddressSanitizer: attempting free on address which was not malloc()-ed: 0x607000000110 in thread T0
#1 0x55ef15b5c7a2 in f(std::array<double, 7ul> const&) /vol/s4dx/src/assembly_demo/debug.cpp:7

0x607000000110 is located 16 bytes inside of 72-byte region [0x607000000100,0x607000000148)
allocated by thread T0 here:
#15 0x55ef15b5c8ba in f(std::array<double, 7ul> const&) /vol/s4dx/src/assembly_demo/debug.cpp:8

Sometimes I get out-of-bound READs in a similar context as well, but I could not distill a minimal example for that.

meyerj commented 2 years ago

Interesting case! Thanks for the example program.

I could reproduce the asan error using your example with GCC version 9.4.0 on a 64-bit Ubuntu 20.04 Focal system and an installation of liborocos-kdl-dev version 1.4.0-7ubuntu1 from Debian/Ubuntu and libeigen3-dev version 3.3.7-2.

I think it is related to how Eigen allocates and frees dynamic memory if the host's malloc cannot guarantee 16-byte alignment or whatever alignment is needed for vectorized, see Memory.h. handmade_aligned_malloc() is a kind of dirty trick to allocate up to 16 bytes more than what is actually needed. The original pointer returned by std::malloc() is then stored just before the segment returned to the caller, and its counterpart handmade_aligned_free() then looks up that address to call std::free(). Depending on what calls the sanitizer intercepts, it may wrongly assume that the pointer that is freed is not the same as the one returned by Eigen's aligned_allocator<T> (16 byte into what std::malloc() returned). However, the calls to malloc and free should still be consistent...

The following things I tested remove the sanitizer error:

Update: If I compile without the address sanitizer handmade_aligned_malloc() and handmade_aligned_free() do not even get called. That is good, because normally on 64-bit platforms std::malloc() does already return aligned pointers and Eigen should define EIGEN_MALLOC_ALREADY_ALIGNED itself. But for some reason the detection in Memory.h includes a check of __SANITIZE_ADDRESS__, and hence the allocation is done trough handmade_aligned_malloc() but the destructor of KDL::JntArray that was compiled without invokes free() directly. That is what triggers the error then. Apparently __SANITIZE_ADDRESS__ is not defined with Clang...

So in summary: Not an actual issue, but GCC's address santizer fools Eigen's alignment detection. And because the program under test and KDL were not compiled with the same flags, the address santizer itself causes an actual memory problem.

meyerj commented 2 years ago

https://gitlab.com/libeigen/eigen/-/commit/b6dc2613acbb4659988eb3237225bb0974d85d52

But adding -DEIGEN_MALLOC_ALREADY_ALIGNED compiler option would not help, if std::malloc() could truely return unaligned pointers with -fsanitize=address, as mentioned in https://gitlab.com/libeigen/eigen/-/issues/552...

rhaschke commented 2 years ago

That manual allocation of JntArray::data uses handmade_aligned_malloc() while its deallocation doesn't is a good explanation for the observed behavior. I will switch to clang then...