loco-3d / crocoddyl

Crocoddyl is an optimal control library for robot control under contact sequence. Its solver is based on various efficient Differential Dynamic Programming (DDP)-like algorithms
BSD 3-Clause "New" or "Revised" License
808 stars 166 forks source link

Memory alignement issue with boost python #864

Closed proyan closed 3 years ago

proyan commented 3 years ago

I'm facing a memory alignment issue, which I hope @jmirabel or @jcarpent could help with:

When trying to access the reference oject of CostModelContactForce in python, (which is of type crocoddyl::FrameForceTpl<double> defined here), sometimes I encounter a segfault. It doesn't come always, sometimes it works, and sometimes it doesn't.

With @nmansard, we made a simple test based on the existing utils/biped.py file, in order to check what is happening. We were able to reproduce the issue in both our systems.

The file is attached. When I try to access this reference object after the simulation, I get the segfault.

This is the line which leads to segfault:

a=m.differential.costs.costs['force16'].cost.reference

When investigating further, the error seems to be coming while allocating creating an instance of pinocchio::ForceTpl object in python. gdb tells me that the segfault arrives after the code has left get_reference in the python bindings, and when the new python instance of FrameForce object (and consequently the pinocchio::Force object) is being created.

When I look at the assembly code of what is happening, it looks clear that this is a memory alignment issue. The segfault is coming from this line in Eigen:

Thread 1 "ipython" received signal SIGSEGV, Segmentation fault.
0x00007f1432dc2306 in Eigen::DenseStorage<double, 6, 6, 1, 0>::DenseStorage (this=0x7f142c675bc8, other=...) at /usr/include/eigen3/Eigen/src/Core/DenseStorage.h:194
194     DenseStorage(const DenseStorage& other) : m_data(other.m_data) {

where the 6 dimensional m_data object is being copied from the pinocchio::Force object to the boost python instance (not sure which). In order to do this copy, following are the assembly instructions:

; The following lines move the m_data from the pinocchio::Force object (at memory location given by %r14) to the register xmm. 
; Each line moves two doubles to the register.

movaps 0x10(%r14),%xmm0        
movaps 0x20(%r14),%xmm1
movaps 0x30(%r14),%xmm2

; The following lines move the m_data from the registers to the memory location given by %rbx
movaps %xmm0,0x50(%rbx)   ; Segfaults
movaps %xmm1,0x60(%rbx)
movaps %xmm2,0x70(%rbx)

If I look at the address r14, I see that it points to an address that is 128 bit aligned in memory.

(gdb) info registers r14
r14            0x7fffbd4924d0      140736369075408

Thus, the first mov succeeds. However, the same is not true for the register rbx

(gdb) info registers rbx
rbx            0x7f142c675b78      139724621044600

The address ends with 78, and is not 128 bit aligned in memory, and I think this is the reason that the segfault appears.

CostModelContactForce, FrameForce, and pinocchio::Force, all have the macroEIGEN_MAKE_ALIGNED_OPERATOR_NEW` defined in public, and thus they are all always aligned in memory. However, the boost python class that creates binds these quantities don't have this macro. Hence, I suspect that the object instance is being created at a misaligned memory location by boost python.

My question is, since this issue is not appearing in pinocchio::Force, what is being done differently there that is not being done here when we do our bindings? For reference, the class ContactForce is being binded here: https://github.com/loco-3d/crocoddyl/blob/devel/bindings/python/crocoddyl/multibody/costs/contact-force.cpp#L91

In order to reproduce, please run this file in python (https://gist.github.com/proyan/61bf620ee887b872e1c895af905940b3) , and then run the following line

a=m.differential.costs.costs['force16'].cost.reference

It may, or maynot work (based on the alignment, there is a 50% chance that some change in the code would make it not working). The assembly trace above is for the cases it doesn't work.

jmirabel commented 3 years ago
  1. is there any possibility that crocoddyl is compiled with different compilation options that other Eigen based projects it depends on ? Anything from robotpkg ?
  2. can you check with gdb that the reference return by get_reference, just before the SEGV, is a valid object.
  3. if the issue is a misaligned object, you should be able to create a minimal example.
jcarpent commented 3 years ago

I think you are not saying to Python that you are returning an internal reference ;)

jcarpent commented 3 years ago

The second thing is that you should maybe need to use EIGENPY_DEFINE_STRUCT_ALLOCATOR_SPECIALIZATION to make things correctly aligned in memory. In Pinocchio, this is done for SE3 and other objects

cmastalli commented 3 years ago

@proyan I don't have this issue when I enable the vectorization option. Are you compiling with native all the Crocoddyl dependencies?

proyan commented 3 years ago

The second thing is that you should maybe need to use EIGENPY_DEFINE_STRUCT_ALLOCATOR_SPECIALIZATION to make things correctly aligned in memory. In Pinocchio, this is done for SE3 and other objects

Thanks @jcarpent I think this is the solution I was looking for. The solution suggested by google was to create a specific aligned allocator as well. I can't reproduce the error now (which may not mean that the problem is solved, since I cannot consistently reproduce the issue.)

is there any possibility that crocoddyl is compiled with different compilation options that other Eigen based projects it depends on ? Anything from robotpkg ? can you check with gdb that the reference return by get_reference, just before the SEGV, is a valid object. if the issue is a misaligned object, you should be able to create a minimal example.

crocoddyl was being compiled with the same compilation options. And I had verified that get_reference actually does return a value. The problem happened when boost python takes this object and uses it to print on the python interpreter., or assign to a new variable.

I think you are not saying to Python that you are returning an internal reference ;)

That is true, currently a copy is being made. We don't want that.

@proyan I don't have this issue when I enable the vectorization option. Are you compiling with native all the Crocoddyl dependencies?

This is not related to the vectorization option.

proyan commented 3 years ago

Fixed with #865