Closed proyan closed 3 years ago
gdb
that the reference return by get_reference
, just before the SEGV, is a valid object.I think you are not saying to Python that you are returning an internal reference ;)
The second thing is that you should maybe need to use EIGENPY_DEFINE_STRUCT_ALLOCATOR_SPECIALIZATION to make things correctly aligned in memory. In Pinocchio, this is done for SE3 and other objects
@proyan I don't have this issue when I enable the vectorization option. Are you compiling with native
all the Crocoddyl dependencies?
The second thing is that you should maybe need to use EIGENPY_DEFINE_STRUCT_ALLOCATOR_SPECIALIZATION to make things correctly aligned in memory. In Pinocchio, this is done for SE3 and other objects
Thanks @jcarpent I think this is the solution I was looking for. The solution suggested by google was to create a specific aligned allocator as well. I can't reproduce the error now (which may not mean that the problem is solved, since I cannot consistently reproduce the issue.)
is there any possibility that crocoddyl is compiled with different compilation options that other Eigen based projects it depends on ? Anything from robotpkg ? can you check with gdb that the reference return by get_reference, just before the SEGV, is a valid object. if the issue is a misaligned object, you should be able to create a minimal example.
crocoddyl was being compiled with the same compilation options. And I had verified that get_reference actually does return a value. The problem happened when boost python takes this object and uses it to print on the python interpreter., or assign to a new variable.
I think you are not saying to Python that you are returning an internal reference ;)
That is true, currently a copy is being made. We don't want that.
@proyan I don't have this issue when I enable the vectorization option. Are you compiling with native all the Crocoddyl dependencies?
This is not related to the vectorization option.
Fixed with #865
I'm facing a memory alignment issue, which I hope @jmirabel or @jcarpent could help with:
When trying to access the
reference
oject ofCostModelContactForce
in python, (which is of typecrocoddyl::FrameForceTpl<double>
defined here), sometimes I encounter a segfault. It doesn't come always, sometimes it works, and sometimes it doesn't.With @nmansard, we made a simple test based on the existing
utils/biped.py
file, in order to check what is happening. We were able to reproduce the issue in both our systems.The file is attached. When I try to access this
reference
object after the simulation, I get the segfault.This is the line which leads to segfault:
When investigating further, the error seems to be coming while allocating creating an instance of pinocchio::ForceTpl object in python.
gdb
tells me that the segfault arrives after the code has leftget_reference
in the python bindings, and when the new python instance ofFrameForce
object (and consequently thepinocchio::Force
object) is being created.When I look at the assembly code of what is happening, it looks clear that this is a memory alignment issue. The segfault is coming from this line in Eigen:
where the 6 dimensional m_data object is being copied from the pinocchio::Force object to the boost python instance (not sure which). In order to do this copy, following are the assembly instructions:
If I look at the address
r14
, I see that it points to an address that is 128 bit aligned in memory.Thus, the first
mov
succeeds. However, the same is not true for the registerrbx
The address ends with 78, and is not 128 bit aligned in memory, and I think this is the reason that the segfault appears.
CostModelContactForce
, FrameForce, and pinocchio::Force, all have the macro
EIGEN_MAKE_ALIGNED_OPERATOR_NEW` defined in public, and thus they are all always aligned in memory. However, the boost python class that creates binds these quantities don't have this macro. Hence, I suspect that the object instance is being created at a misaligned memory location by boost python.My question is, since this issue is not appearing in pinocchio::Force, what is being done differently there that is not being done here when we do our bindings? For reference, the class
ContactForce
is being binded here: https://github.com/loco-3d/crocoddyl/blob/devel/bindings/python/crocoddyl/multibody/costs/contact-force.cpp#L91In order to reproduce, please run this file in python (https://gist.github.com/proyan/61bf620ee887b872e1c895af905940b3) , and then run the following line
It may, or maynot work (based on the alignment, there is a 50% chance that some change in the code would make it not working). The assembly trace above is for the cases it doesn't work.