mockingbirdnest / Principia

𝑛-Body and Extended Body Gravitation for Kerbal Space Program
MIT License
771 stars 70 forks source link

A crash occurred while entering orbit (Visual C++ Runtime Library) #4038

Open Helium-ion opened 4 months ago

Helium-ion commented 4 months ago

The game crashes just about to enter a Kerbin circular orbit, and then this appears. dialog box Here's the glog. No files were produced in the crash folder.

How to reproduce:

  1. Enter this save.
  2. Load the archive named A1 or B1 if you want it to be faster.
  3. Try to get the ship into the Kerbin orbit.
  4. Enjoy crashing when you're nearing completion. :)

Additional Information:

  1. Only Principia and two DLCs have been added to the game.
  2. If Principia is removed, The game will not crash when entering orbit.
  3. As its name suggests, this save was created in 20240712 and uses the latest Principia without changing the Principia version.
pleroy commented 4 months ago

This error is typically due to a corruption or conflict in the Visual C++ runtime. Please make sure that you have a single version of the Microsoft Visual C++ 2015-2022 Redistributable installed, and that it's at least version 14.38.33130. If you do have multiple versions, I suggest uninstalling them all and installing the latest version from the Microsoft site https://aka.ms/vs/17/release/vc_redist.x64.exe.

Helium-ion commented 4 months ago

Thanks, but sadly it doesn't work and the same crash is still happening.

pleroy commented 4 months ago

This is going to be hard to analyse because it's very likely to be related to the environment on your computer. You might want to send us a PML file by following the instructions here. This might help find the anomaly, although that's not guaranteed.

Helium-ion commented 4 months ago

I'm not very familiar with it. If I'm doing it correctly,here is should be the expected PML file. Hope it helps.

Helium-ion commented 4 months ago

Breakthrough: The game doesn't crash without the use of Kerbin Fixed Coordinate System. The problem seems to happen when drawing a path that doesn't strike the ground. (It seems to me so, but I don't guarantee that I don't understand how Principia works.)

pleroy commented 4 months ago

You did record the .pml file correctly, but I am still clueless about what's happening. You probably have a log file at C:\Users\Administrator\AppData\LocalLow\Squad\Kerbal Space Program\Player.log. If you could reproduce the crash and send us that log file, that would be nice.

(I have a hunch that we have a bug in the interchange between C# and C++ when computing the intersection of the trajectories with the ground. We had a problem that was vaguely similar in #3872, and I don't think that we got to the bottom of it.)

pleroy commented 4 months ago

I was able to load your save, but I couldn't reproduce the crash (I tried various setting of the prediction in the hope of stress-testing the terrain collision algorithm).

Helium-ion commented 3 months ago

I'll have time soon to try to get the log file and send it. During these days, I carefully avoided using the coordinatesystem, and the game worked fine and there were no crashes. It seems that the error only happens on stable tracks. If you are in a flyover or impact orbit, then the coordinate system will not cause a crash.

Helium-ion commented 3 months ago

This is the .log file from the time it entered Kerbin's orbit a few days ago.

An additional example, it is an artificial satellite in the orbit of Mun. Switching to Mun Fixed Coordinate System also causes the game to crash.This is the .log file for this crash.

pleroy commented 3 months ago

For what it's worth I am attaching below the stack of the last load of the C++ runtime, obtained from the .pml file. The version of the runtime is 14.40.33810.0 and we are in optional. As expected, the call happens on the main thread of KSP. The crash happens 2.2 s later.

https://github.com/mockingbirdnest/Principia/blob/b38191778528bbddd92e3c25eb7c8112df81726d/ksp_plugin/interface_collision.cpp#L66-L70

The interesting part is that the stack is still at the same addresses in Principia and optional code when the (main) thread exits.

pleroy commented 3 months ago

Could you give us a journal by following the instructions here? At this point I am convinced that the problem has to do with the computation of collisions, which we only do in the Kerbin surface frame. I would like to see the last interactions between the C# and C++ code before the crash.

pleroy commented 3 months ago

The .pml file has many pairs of Thread Create/Thread Exit where the thread creation is at:

https://github.com/mockingbirdnest/Principia/blob/b38191778528bbddd92e3c25eb7c8112df81726d/ksp_plugin/interface_collision.cpp#L134-L143 https://github.com/mockingbirdnest/Principia/blob/b38191778528bbddd92e3c25eb7c8112df81726d/ksp_plugin/interface_collision.cpp#L54-L54 https://github.com/mockingbirdnest/Principia/blob/b38191778528bbddd92e3c25eb7c8112df81726d/base/not_null_body.hpp#L207-L207 https://github.com/mockingbirdnest/Principia/blob/b38191778528bbddd92e3c25eb7c8112df81726d/base/push_pull_callback_body.hpp#L89-L89

There are the computations of the collisions. Note that presumably all the corresponding calls to CollisionDeleteExecutor succeeded until the last one.

Helium-ion commented 3 months ago

The journal file on Kerbin's orbit is so large that I couldn't upload it after trying many ways. I suggest you take a look at the example on the Mun's orbit first. This is the Journal file for the extra example on the Mun orbit.

Interestingly, the game behaves even stranger when journal recording is turned on. In the case of Mun orbit's satellites, the crash no longer occurs immediately. It took me several times to switch the camera and coordinate system before I was able to reproduce the crash. (Maybe it's just because the computer is old and the game is slow.)

pleroy commented 3 months ago

The journal indeed ends with the deletion of an executor:

E0721 11:58:22.789089 46784 player.cpp:108] index: 1131303
[principia.journal.serialization.CollisionNewPredictionExecutor.extension] { in { plugin: 2665139678032 celestial_index: 2 sun_world_position { x: -11324901998.100574 y: 4823.8598558862177 z: -7531417763.9836693 } vessel_guid: "9197a9f0-55df-4eec-a4e2-67b68166596f" max_points: 64 } }
[principia.journal.serialization.CollisionNewPredictionExecutor.extension] { return { result: 2665086787776 } }
E0721 11:58:22.789089 46784 player.cpp:108] index: 1131304
[principia.journal.serialization.CollisionGetLatitudeLongitude.extension] { in { executor: 2665086787776 } }
[principia.journal.serialization.CollisionGetLatitudeLongitude.extension] { out { latitude_in_degrees: 0 longitude_in_degrees: 0 } return { result: false } }
E0721 11:58:22.789616 46784 player.cpp:75] Unpaired method:
[principia.journal.serialization.CollisionDeleteExecutor.extension] {
  in {
    plugin: 2665139678032
    executor: 2665086787776
  }
}

but the crash doesn't reproduce on my machine.

Things that we have been looking into:

  1. The computation of collisions captures the prediction by reference. If the prediction was changed (replaced by the prognostication) while the collisions are computed, we could have undefined behaviour. However that doesn't seem to happen: the asynchronous integration of vessels is between Precalc and Early, but the computation of markers is much later, in BetterLateThanNever in LateUpdate.
  2. The processor doesn't have FMA. If we had an unprotected FMA call, it could execute an illegal instruction. Hacking the code to simulate failures on FMA didn't uncover anything. (Tried in game and in the journal.)
pleroy commented 3 months ago

3872 seems to be another occurrence of the same problem. The INFO log for that other issue has an interesting piece of information, viz., that the processor doesn't have FMA support:

Excerpt from the #3872 log:

E0813 19:26:50.544476 10880 interface.cpp:737] Running on AMD Athlon(tm) II X2 250 Processor               (AuthenticAMD)
E0813 19:26:50.567412 10880 interface.cpp:739] with FPU SSE SSE2 SSE3

Excerpt from the log for this issue:

E0713 18:42:00.948038  5268 interface.cpp:737] Running on Intel(R) Celeron(R) CPU 3855U @ 1.60GHz          (GenuineIntel)
E0713 18:42:00.984560  5268 interface.cpp:739] with FPU SSE SSE2 SSE3 SSE4_1

This might indicate that the problem is related to FMA. I have to believe that most of our users are using processors that have FMA (FMA was introduced by AMD in 2012 and by Intel in 2013) so it is possible that we misbehave (or that the Visual C++ Runtime misbehaves) on processors that don't have FMA support. Unfortunately, we don't have easy access to machines without FMA, so if this is the root cause it's going to be hard to investigate.