mozilla / fxrecord

https://mozilla.github.io/fxrecord
Mozilla Public License 2.0
3 stars 3 forks source link

Intermittent crash in PssCaptureSnapshot #27

Open brennie opened 3 years ago

brennie commented 3 years ago

In unit tests in debug mode, we sometimes crash in integration_tests::test_resume_session_ok when calling into PssCaptureSnapshot. The stack trace looks like:

[0x0] ntdll!memcpy + 0x92
[0x1] ntdll!PsspHandleDumper + 0x1a1
[0x2] ntdll!PsspWalkHandleTable + 0x21e
[0x3] ntdll!PsspCaptureHandleInformation + 0x238
[0x4] ntdll!PssNtCaptureSnapshot + 0x373
[0x5] KERNELBASE!PssCaptureSnapshot + 0x1e

This issue does not happen when running the tests in release mode with --nocapture, which leads me to believe it is a race condition.

How often it occurs is also very strange. Sometimes it seems to happen 100% of the time when running the test from the console but pass 100% of the time when run from WinDbg.

brennie commented 3 years ago

We should investigate enumerating processes with the ToolHelp32 API so that we don't have this weird edge case for unit tests.

brennie commented 2 years ago

I looked into using Job Objects to get the child processes, but Firefox's launcher creates an unnamed job so we cannot open it from fxrunner.

brennie commented 2 years ago

However, the launcher process sets JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE when --wait-for-browser is specified. This flag will kill all processes in the job when the last handle to the job is closed.

We can actually just specify this flag and then kill the launcher process! We don't actually have to enumerate the child processes at all. This will make things much simpler and we can get rid of our dependency on PssCaptureSnapshot.