rst-tu-dortmund / teb_local_planner

An optimal trajectory planner considering distinctive topologies for mobile robots based on Timed-Elastic-Bands (ROS Package)
http://wiki.ros.org/teb_local_planner
BSD 3-Clause "New" or "Revised" License
985 stars 546 forks source link

[bug] compiling with `-march=native` causes crash: `double free or corruption (out)` #336

Open VRichardJP opened 2 years ago

VRichardJP commented 2 years ago

I have been playing with compilation flags to see if I can slightly speed up teb_local_planner. In particular I have tested -march=native, which seems to always cause a double free or corruption (out)

Step to reproduce:

Logs:

$ roslaunch teb_local_planner test_optim_node.launch 
... logging to /home/dev/.ros/log/623bd2d8-56f4-11ec-adbd-0242ac110002/roslaunch-0ad64aca3e9e-5993.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://0ad64aca3e9e:40529/

SUMMARY
========

PARAMETERS
 * /rosdistro: melodic
 * /rosversion: 1.14.11

NODES
  /
    rviz (rviz/rviz)
    test_optim_node (teb_local_planner/test_optim_node)

auto-starting new master
process[master]: started with pid [6006]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 623bd2d8-56f4-11ec-adbd-0242ac110002
process[rosout-1]: started with pid [6018]
started core service [/rosout]
process[test_optim_node-2]: started with pid [6022]
process[rviz-3]: started with pid [6027]
[ INFO] [1638836817.022480729]: No robot footprint model specified for trajectory optimization. Using point-shaped model.
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-dev'
double free or corruption (out)
[test_optim_node-2] process has died [pid 6022, exit code -6, cmd /home/dev/workspace/install/teb_local_planner/lib/teb_local_planner/test_optim_node __name:=test_optim_node __log:=/home/dev/.ros/log/623bd2d8-56f4-11ec-adbd-0242ac110002/test_optim_node-2.log].
log file: /home/dev/.ros/log/623bd2d8-56f4-11ec-adbd-0242ac110002/test_optim_node-2*.log

On my machine, -march=native does this:

$ gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
 /usr/lib/gcc/x86_64-linux-gnu/7/cc1 -E -quiet -v -imultiarch x86_64-linux-gnu - -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -msgx -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=12288 -mtune=generic -fstack-protector-strong -Wformat -Wformat-security
VRichardJP commented 2 years ago

After some trials and error I found the actual culprit to be -mf16c. This enable half<->float conversion. It seems that gcc uses it at several places:

$ objdump -d /home/dev/workspace/install/teb_local_planner/lib/libteb_local_planner.so  | grep vcvtp
   d4022:       c5 fd 5a c0             vcvtpd2ps %ymm0,%xmm0
   d402c:       c5 fd 5a c9             vcvtpd2ps %ymm1,%xmm1
   d6414:       c5 fd 5a e1             vcvtpd2ps %ymm1,%xmm4

I don't really get why this would cause a double free.

Just to make sure it is not just a gcc thing, I also compiled teb with clang. In this case the crash is even easier to reproduce: the program crashes on startup with the same double free or corruption (out) error.

clang seems again to make use of the F16C instructions:

$ objdump -d /home/dev/workspace/install/teb_local_planner/lib/libteb_local_planner.so  | grep vcvtp
  1024b5:       c5 f8 5a 01             vcvtps2pd (%rcx),%xmm0
  10258a:       c5 fc 5a c0             vcvtps2pd %xmm0,%ymm0
  1026d2:       c4 a1 78 5a 04 29       vcvtps2pd (%rcx,%r13,1),%xmm0
  102702:       c4 a1 78 5a 04 29       vcvtps2pd (%rcx,%r13,1),%xmm0
  10281d:       c5 f8 5a 00             vcvtps2pd (%rax),%xmm0

To reproduce:

$ rm -rf build/ install/
$ export CC=clang
$ export CXX=clang++
$ colcon build <...>
$ roslaunch teb_local_planner test_optim_node.launch

The program will instantly crash

VRichardJP commented 2 years ago

Running test_optim_node with valgrind shows the problem may come from g2o library that is included with melodic, or the way BaseTebMultiEdge is used in teb local planner:

==4330== Memcheck, a memory error detector
==4330== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4330== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4330== Command: /home/dev/workspace/install/teb_local_planner/lib/teb_local_planner/test_optim_node __name:=test_optim_node __log:=/home/dev/.ros/log/840700a0-5743-11ec-9eb2-0242ac110002/test_optim_node-1.log
==4330== 
process[rviz-2]: started with pid [4353]
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-dev'
==4330== Invalid free() / delete / delete[] / realloc()
==4330==    at 0x4C32D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4330==    by 0x6ECE6FC: g2o::EdgeSE2PointXYCalib::~EdgeSE2PointXYCalib() (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x6EC710A: ??? (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x40108D2: call_init (dl-init.c:72)
==4330==    by 0x40108D2: _dl_init (dl-init.c:119)
==4330==    by 0x40010C9: ??? (in /lib/x86_64-linux-gnu/ld-2.27.so)
==4330==    by 0x2: ???
==4330==    by 0x1FFEFFFE3E: ???
==4330==    by 0x1FFEFFFE94: ???
==4330==    by 0x1FFEFFFEAC: ???
==4330==  Address 0x304ac7c0 is 16 bytes inside a block of size 128 alloc'd
==4330==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4330==    by 0x4EE2FA9: handmade_aligned_malloc (Memory.h:88)
==4330==    by 0x4EE2FA9: aligned_malloc (Memory.h:164)
==4330==    by 0x4EE2FA9: Eigen::aligned_allocator<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > >::allocate(unsigned long, void const*) (Memory.h:742)
==4330==    by 0x4EE1777: allocate (alloc_traits.h:301)
==4330==    by 0x4EE1777: _M_allocate (stl_vector.h:172)
==4330==    by 0x4EE1777: std::vector<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::aligned_allocator<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > > >::_M_fill_insert(__gnu_cxx::__normal_iterator<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >*, std::vector<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::aligned_allocator<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > > > >, unsigned long, Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > const&) (vector.tcc:505)
==4330==    by 0x6ECFB54: g2o::BaseMultiEdge<2, Eigen::Matrix<double, 2, 1, 0, 2, 1> >::resize(unsigned long) (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x6ECE21B: g2o::EdgeSE2PointXYCalib::EdgeSE2PointXYCalib() (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x6ED5A9F: g2o::HyperGraphElementCreator<g2o::EdgeSE2PointXYCalib>::construct() (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x6A6B36E: g2o::Factory::registerType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, g2o::AbstractHyperGraphElementCreator*) (in /opt/ros/melodic/lib/libg2o_core.so)
==4330==    by 0x6EC710A: ??? (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x40108D2: call_init (dl-init.c:72)
==4330==    by 0x40108D2: _dl_init (dl-init.c:119)
==4330==    by 0x40010C9: ??? (in /lib/x86_64-linux-gnu/ld-2.27.so)
==4330==    by 0x2: ???
==4330==    by 0x1FFEFFFE3E: ???
==4330== 
==4330== Invalid free() / delete / delete[] / realloc()
==4330==    at 0x4C32D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4330==    by 0x6EE723C: g2o::EdgeSE2TwoPointsXY::~EdgeSE2TwoPointsXY() (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x6EC745E: ??? (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x40108D2: call_init (dl-init.c:72)
==4330==    by 0x40108D2: _dl_init (dl-init.c:119)
==4330==    by 0x40010C9: ??? (in /lib/x86_64-linux-gnu/ld-2.27.so)
==4330==    by 0x2: ???
==4330==    by 0x1FFEFFFE3E: ???
==4330==    by 0x1FFEFFFE94: ???
==4330==    by 0x1FFEFFFEAC: ???
==4330==  Address 0x304ae280 is 16 bytes inside a block of size 128 alloc'd
==4330==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4330==    by 0x4EE2FA9: handmade_aligned_malloc (Memory.h:88)
==4330==    by 0x4EE2FA9: aligned_malloc (Memory.h:164)
==4330==    by 0x4EE2FA9: Eigen::aligned_allocator<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > >::allocate(unsigned long, void const*) (Memory.h:742)
==4330==    by 0x4EE1777: allocate (alloc_traits.h:301)
==4330==    by 0x4EE1777: _M_allocate (stl_vector.h:172)
==4330==    by 0x4EE1777: std::vector<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::aligned_allocator<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > > >::_M_fill_insert(__gnu_cxx::__normal_iterator<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >*, std::vector<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::aligned_allocator<Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > > > >, unsigned long, Eigen::Map<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> > const&) (vector.tcc:505)
==4330==    by 0x6EE7954: g2o::BaseMultiEdge<4, Eigen::Matrix<double, 4, 1, 0, 4, 1> >::resize(unsigned long) (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x6EE6D8B: g2o::EdgeSE2TwoPointsXY::EdgeSE2TwoPointsXY() (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x6ED56AF: g2o::HyperGraphElementCreator<g2o::EdgeSE2TwoPointsXY>::construct() (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x6A6B36E: g2o::Factory::registerType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, g2o::AbstractHyperGraphElementCreator*) (in /opt/ros/melodic/lib/libg2o_core.so)
==4330==    by 0x6EC745E: ??? (in /opt/ros/melodic/lib/libg2o_types_slam2d.so)
==4330==    by 0x40108D2: call_init (dl-init.c:72)
==4330==    by 0x40108D2: _dl_init (dl-init.c:119)
==4330==    by 0x40010C9: ??? (in /lib/x86_64-linux-gnu/ld-2.27.so)
==4330==    by 0x2: ???
==4330==    by 0x1FFEFFFE3E: ???
==4330==
profxavr commented 1 year ago

Did you find a way around this ever? I am running into this exact issue right now.

VRichardJP commented 1 year ago

I can't really tell. After this I stopped trying to use -march=native for a while.

Recently, I chased after strange bug in our codebase: one of our custom edges behaved totally wrong (the kind of 1+1=-7). With valgrind I observed memory leaks from the offending edge. After reordering the edge code a bit, the problem and memory leak disappeared (the code logic is still the same).

This looks typically like an Eigen memory alignment issue, somewhere in teb or g2o. But I couldn't not say where.

After the "fix", I went on and tried to use -march=native and now it works. No idea why.

So I can suggest you a few things: