It turns out our data harvester logs are full of attempting to copy the same subject over and over again, and then crashing with a segfault. Here's a representative snippet:
^[[1;33mWarning [CompositeResourceRetriever.cpp:96]^[[0m [CompositeResourceRetriever::retrieve] All ResourceRetrievers registered for this schema failed to retrieve the URI 'file:///tmp/tmpu1bq7e4i/Geometry/Rib1R.vtp.ply' (tried 1).
^[[1;33mWarning [MeshShape.cpp:493]^[[0m [MeshShape::loadMesh] Failed loading mesh 'file:///tmp/tmpu1bq7e4i/Geometry/Rib1R.vtp.ply' with ASSIMP error 'Unable to open file "file:///tmp/tmpu1bq7e4i/Geometry/Rib1R.vtp.ply".'.
^[[1;33mWarning [LocalResource.cpp:48]^[[0m [LocalResource::constructor] Failed opening file '/tmp/tmpu1bq7e4i/Geometry/Sternum.vtp.ply' for reading: No such file or directory
^[[1;33mWarning [CompositeResourceRetriever.cpp:96]^[[0m [CompositeResourceRetriever::retrieve] All ResourceRetrievers registered for this schema failed to retrieve the URI 'file:///tmp/tmpu1bq7e4i/Geometry/Sternum.vtp.ply' (tried 1).
^[[1;33mWarning [MeshShape.cpp:493]^[[0m [MeshShape::loadMesh] Failed loading mesh 'file:///tmp/tmpu1bq7e4i/Geometry/Sternum.vtp.ply' with ASSIMP error 'Unable to open file "file:///tmp/tmpu1bq7e4i/Geometry/Sternum.vtp.ply".'.
WARNING! Creating a WeldJoint as an intermediate (non-root) joint. This will cause the gradient computations to run with slower algorithms. If you find a way to remove this WeldJoint, things should run faster.
WARNING! Creating a WeldJoint as an intermediate (non-root) joint. This will cause the gradient computations to run with slower algorithms. If you find a way to remove this WeldJoint, things should run faster.
Signal received: 48, errno: 0
################################################################################
Stack trace:
################################################################################
/home/users/keenon/.local/lib/python3.9/site-packages/_awscrt.cpython-39-x86_64-linux-gnu.so(aws_backtrace_print+0x4f) [0x7f667e50f7ef]
/home/users/keenon/.local/lib/python3.9/site-packages/_awscrt.cpython-39-x86_64-linux-gnu.so(+0x7dca3) [0x7f667e45cca3]
/lib64/libpthread.so.0(+0xf630) [0x7f6685f04630]
/home/users/keenon/.local/lib/python3.9/site-packages/nimblephysics_libs/_nimblephysics.so(_ZN4dart8dynamics5Joint7setNameERKSsb+0x15) [0x7f667288c965]
/home/users/keenon/.local/lib/python3.9/site-packages/nimblephysics_libs/_nimblephysics.so(_ZN4dart12biomechanics11createJointESt10shared_ptrINS_8dynamics8SkeletonEEPNS2_8BodyNodeEPN8tinyxml210XMLElementES9_N5Eigen9TransformIdLi3ELi1ELi0EEESC_SsSsRKS1_INS_6common17ResourceRetrieverEE+0x9fb) [0x7f6672c3df9b]
I'm not sure why Nimble is segfaulting on this user's OpenSim skeleton when it tries to map the markerset to a standard Rajagopal skeleton, but I think it's a bottomless pit to try to fix every segfault here, so we should also protect our data harvester from segfaults.
This PR just splits the offending section out as a separate process, and checks the exit code. This is just a fancier version of a try/catch now.
I have not yet tested this in production, ideas for how to test are welcome!
It turns out our data harvester logs are full of attempting to copy the same subject over and over again, and then crashing with a segfault. Here's a representative snippet:
I'm not sure why Nimble is segfaulting on this user's OpenSim skeleton when it tries to map the markerset to a standard Rajagopal skeleton, but I think it's a bottomless pit to try to fix every segfault here, so we should also protect our data harvester from segfaults.
This PR just splits the offending section out as a separate process, and checks the exit code. This is just a fancier version of a try/catch now.
I have not yet tested this in production, ideas for how to test are welcome!