Closed Spliterash closed 1 year ago
Thank you for reporting this issue, @Spliterash! Crashes in native libraries may be difficult to debug, but they always happen for a reason. I'll gladly analyze the crash log you provided. Perhaps I can deduce what happened.
Meanwhile, since you are using a Release native library, please attempt to reproduce the crash with the corresponding Debug native library from GitHub (probably "Linux64DebugSp_libbulletjme.so"). If you load the library using NativeLibraryLoader.loadLibbulletjme()
, you'll also need to change the 3rd argument from "Release" to "Debug". The Debug natives are slow, but they include many extra checks that can catch trouble before it crashes the JVM.
I'll get back to you shortly with a crash-log analysis, and we'll take it from there.
Thanks for providing a link to the JVM crash log.
The crash occurred in Bullet, while advancing the simulation: btRaycastVehicle::updateFriction(float)
. That narrows it down to about 160 lines of code: https://github.com/stephengold/Libbulletjme/blob/38936628048e8dbc91dc443e1c250a500dec27bf/src/main/native/bullet3/BulletDynamics/Vehicle/btRaycastVehicle.cpp#L475-L643
Since Release native was used, the log doesn't report the line number or the native call stack. A log generated with Debug natives would surely provide more clues.
This doesn't resemble any issues I've seen recently, so I doubt upgrading to v17.4.0 will help. It shouldn't hurt though.
The current thread is "Server Physics Thread" and the Java call stack includes CompletableFuture$AsyncRun.run()
which makes me wonder about thread safety. Is there any possibility PhysicsSpace
is being accessed from more than one thread?
CompletableFuture$AsyncRun.run()
is called via CompletableFuture.runAsync({/*Task*/},physicsThread)
. This crash is rare and I don't know what is causing it. I installed the debug version, and if this happens again, I will definitely show what it gives. Thanks again for your help
Hey, @spliterash! One more clue I forgot to mention:
btRaycastVehicle
is the native implementation of vehicle dynamics, so the more PhysicsVehicle
objects you have in your PhysicsSpace
, the more times this code will execute per time step.
Im get another error, when spawn and despawn many cars java: /home/travis/build/stephengold/Libbulletjme/src/main/native/glue/jmePhysicsSpace.cpp:235: static void jmePhysicsSpace::contactStartedCallback(btPersistentManifold* const&): Assertion `pm->getObjectType() == BT_PERSISTENT_MANIFOLD_TYPE' failed.
Its all what i get, no hs_err_pid files, except 5gb dump
I have revised my code and I have places where there is a call without checking thread. Thanks for the note about threads.
I'm glad you got the trouble straightened out.
hs_err_pid7836.log Haven't made my fix yet, but caught it in the debug version
is it possible to somehow make the jvm not crash when physics crashes?
Given that the physics is implemented in native code, preventing JVM crashes is a difficult problem. One solution might be to use a physics engine that's entirely written in a JVM-based language---such as JBullet, for instance.
Concurrent access to a physics space (by different threads) could be prevented in various ways, for instance adding locks or runtime checks.
I'll take a look at the latest crash log.
The original log (hs_err_pid237.log) was from a Linux host, and the new one (hs_err_pid7836.log) is from a Windows host. Why is that?
Since the address being dereferenced is 0x20, I suspect a NULL pointer is being dereferenced.
From the new log, I can't easily determine in which function the crash occurred. (The new crash may have a completely different root cause.) I'll use a disassembler to learn more.
To obtain a human-readable stack trace from a Windows JVM, you'd need to download the PDB file for the native library (Windows64DebugSp_bulletjme.pdb) to the same folder where the native DLL is located. If you plan further testing of Debug natives on Windows, please download the PDB.
Or perhaps fixing the threading issues you found will resolve this issue...
The original log (hs_err_pid237.log) was from a Linux host, and the new one (hs_err_pid7836.log) is from a Windows host. Why is that?
we tested it on another host
From the new log, I can't easily determine in which function the crash occurred. (The new crash may have a completely different root cause.) I'll use a disassembler to learn more
the error was caused in the same way, spawning and deleting a lot of cars
To obtain a human-readable stack trace from a Windows JVM, you'd need to download the PDB file for the native library (Windows64DebugSp_bulletjme.pdb) to the same folder where the native DLL is located. If you plan further testing of Debug natives on Windows, please download the PDB.
"hs_err_pid18292.log" describes a crash unrelated to the previous two. The latest crash occurred in the JVM itself (not Libbulletjme) during an invocation of PhysicsSpace.addRigidBody()
.
Got it again. The only things left out of thread physics are control commands. Do i need call accelerate
, brake
and other controll method in physic thread ?
hs_err_pid238.log
Do i need call accelerate, brake and other control method in physic thread?
I believe you do.
hs_err_pid238.log
At last, a stack trace showing details of a crash in Bullet! Still no line numbers ... I'm unsure why.
The crash occurred here: https://github.com/stephengold/Libbulletjme/blob/4d19c315fe9db52b00b9e940ea442318ccb59f68/src/main/native/bullet3/BulletDynamics/Vehicle/btRaycastVehicle.cpp#L625-L626
Making notes to myself now:
The "-" operator at the end of line 625 represents vector subtraction, which is implemented here: https://github.com/stephengold/Libbulletjme/blob/38936628048e8dbc91dc443e1c250a500dec27bf/src/main/native/bullet3/LinearMath/btVector3.h#L797-L800
The movss
instruction that signaled SIGSEGV was dereferencing %rax
, which should contain the virtual address of the v1
argument, but actually contains 0x38 (not a valid address). That address was passed from updateFriction()
, and it was supposed to refer to wheelInfo.m_raycastInfo.m_contactPointWS
.
wheelInfo.m_raycastInfo.m_contactPointWS
was recently dereferenced in line 614, so I suspect the address got corrupted sometime between line 614 and line 625.
I reviewed https://github.com/stephengold/Libbulletjme/commit/ff15ca4faeed7f7efea0bf1e095db57ab8141694 but didn't see any way it could corrupt m_contactPointWS
.
Hi, When I using the library, the following crashes periodically for no reason:
Most likely the problem is in my code, maybe I don’t close something, but how can I understand where? It just writes it and nothing else. Please help me to guess what could be causing the problem.
hs_err_pid237.log