projectchrono / chrono

High-performance C++ library for multiphysics and multibody dynamics simulations
http://projectchrono.org
BSD 3-Clause "New" or "Revised" License
2.2k stars 461 forks source link

Memory errors: invalid free, invalid malloc, or corrupted size #500

Open Yohanumerics opened 3 months ago

Yohanumerics commented 3 months ago

Hi all,

I am trying to use Chrono for a FEA project (so mainly using the FEA module, and the Timestepper). I sometimes (but not always) get a crash with the following message in the standard error: corrupted size vs. prev_size ...or alternatively: free(): invalid next size (normal)

A Valgrind run gave me some investigation tracks, and it seems that the usage of Eigen might be a (the?) cause. Several messages of this kind were emitted by Valgrind:

==1396== Invalid free() / delete / delete[] / realloc()
==1396==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1396==    by 0x54D7886: chrono::ChTimestepperEulerImplicit::Advance(double) (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x4DC056E: chrono::ChSystem::Integrate_Y() (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x4DB9777: chrono::ChSystem::DoStepDynamics(double) (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x17FFCD: AV::CoreSimulation::processTransientSimulationStage() (AVCoreSimulation.cpp:2404)
==1396==    by 0x18D53E: AV::CoreSimulation::ProcessFullSimulation() (AVCoreSimulation.cpp:1415)
==1396==    by 0x13CB61: main (main.cpp:34)
==1396==  Address 0x6b27be00 is 16 bytes inside a block of size 1,334,672 alloc'd
==1396==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1396==    by 0x1956BB: handmade_aligned_malloc (Memory.h:105)
==1396==    by 0x1956BB: aligned_malloc (Memory.h:188)
==1396==    by 0x1956BB: conditional_aligned_malloc<true> (Memory.h:241)
==1396==    by 0x1956BB: conditional_aligned_new_auto<double, true> (Memory.h:404)
==1396==    by 0x1956BB: resize (DenseStorage.h:639)
==1396==    by 0x1956BB: resize (PlainObjectBase.h:285)
==1396==    by 0x1956BB: resize_if_allowed<Eigen::Matrix<double, -1, 1>, Eigen::Matrix<double, -1, 1>, double, double> (AssignEvaluator.h:764)
==1396==    by 0x1956BB: void Eigen::internal::call_dense_assignment_loop<Eigen::Matrix<double, -1, 1, 0, -1, 1>, Eigen::Matrix<double, -1, 1, 0, -1, 1>, Eigen::internal::assign_op<double, double> >(Eigen::Matrix<double, -1, 1, 0, -1, 1>&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, Eigen::internal::assign_op<double, double> const&) (AssignEvaluator.h:778)
==1396==    by 0x54E5D50: chrono::ChStateDelta::operator*(double) const (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x54D7836: chrono::ChTimestepperEulerImplicit::Advance(double) (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x4DC056E: chrono::ChSystem::Integrate_Y() (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x4DB9777: chrono::ChSystem::DoStepDynamics(double) (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x17FFCD: AV::CoreSimulation::processTransientSimulationStage() (AVCoreSimulation.cpp:2404)
==1396==    by 0x18D53E: AV::CoreSimulation::ProcessFullSimulation() (AVCoreSimulation.cpp:1415)
==1396==    by 0x13CB61: main (main.cpp:34)

...and the run finally ends up in a crash with the following Valgrind output:

==1396== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==1396==  General Protection Fault
==1396==    at 0x195630: _mm256_store_pd (avxintrin.h:875)
==1396==    by 0x195630: pstore<double, __vector(4) double> (PacketMath.h:623)
==1396==    by 0x195630: pstoret<double, __vector(4) double, 32> (GenericPacketMath.h:978)
==1396==    by 0x195630: assignPacket<32, __vector(4) double> (AssignmentFunctors.h:28)
==1396==    by 0x195630: assignPacket<32, 32, __vector(4) double> (AssignEvaluator.h:681)
==1396==    by 0x195630: run (AssignEvaluator.h:437)
==1396==    by 0x195630: void Eigen::internal::call_dense_assignment_loop<Eigen::Matrix<double, -1, 1, 0, -1, 1>, Eigen::Matrix<double, -1, 1, 0, -1, 1>, Eigen::internal::assign_op<double, double> >(Eigen::Matrix<double, -1, 1, 0, -1, 1>&, Eigen::Matrix<double, -1, 1, 0, -1, 1> const&, Eigen::internal::assign_op<double, double> const&) (AssignEvaluator.h:785)
==1396==    by 0x54D785D: chrono::ChTimestepperEulerImplicit::Advance(double) (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x4DC056E: chrono::ChSystem::Integrate_Y() (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x4DB9777: chrono::ChSystem::DoStepDynamics(double) (in /root/Tools/chrono/aad5e16bf2c585fca6d00655879bba3c550a9c9e/install/lib/libChronoEngine.so)
==1396==    by 0x17FFCD: AV::CoreSimulation::processTransientSimulationStage() (AVCoreSimulation.cpp:2404)
==1396==    by 0x18D53E: AV::CoreSimulation::ProcessFullSimulation() (AVCoreSimulation.cpp:1415)
==1396==    by 0x13CB61: main (main.cpp:34)
==1396== 
==1396== HEAP SUMMARY:
==1396==     in use at exit: 2,238,539,575 bytes in 574,867 blocks
==1396==   total heap usage: 188,820,125 allocs, 188,245,272 frees, 57,592,716,416 bytes allocated
==1396== 
==1396== LEAK SUMMARY:
==1396==    definitely lost: 14,681,512 bytes in 16 blocks
==1396==    indirectly lost: 0 bytes in 0 blocks
==1396==      possibly lost: 1,235,421,599 bytes in 141,532 blocks
==1396==    still reachable: 988,436,464 bytes in 433,319 blocks
==1396==         suppressed: 0 bytes in 0 blocks

I recently migrated to chrono 9.0.0 and I have the same behavior.

The version of eigen I use is the 3.4.0-2ubuntu2.

I tried to build with the option "-DEIGEN_DONT_VECTORIZE" (as advised here: https://stackoverflow.com/questions/42181586/sigsegv-using-eigen-and-stdvector ), but this crashes even earlier with the following message: malloc():·invalid·size·(unsorted) ...and Valgrind panics with lots of "invalid write" messages at chrono objects instantiations (behind chrono_types::make_shared calls), and ends up at maximum number of errors reached, hence I guess this solution is not the one I am looking for.