Spark triggering massive amounts of "ThreadDump" safepoints, causing stutter/lag

pietro-lopes commented 2 months ago

Description

Some players at ATM10 are having some lag spikes and I asked them to turn on safepoint and GC logs to see what is going on and turn out this is happening:

With spark https://mclo.gs/q9v2q3W

Without spark https://mclo.gs/XpTnbfu

Reproduction Steps

Happens just by having spark (maybe the background profiler?)

Looks like it is happening to very few people, I can't reproduce it at Linux (PopOS) or Windows 10.

Expected Behaviour

Don't know, is it suffering from safepoint bias (at least for Windows)?

Platform Information

Minecraft Version: 1.21.1
Platform Type: client
Platform Brand: Neoforge
Platform Version: Neo 21.1.47

Spark Version

1.10.97

Logs and Configs

No response

Extra Details

here is some random spark from that player if you need to grab some PC/config specs https://spark.lucko.me/fPQnwEqJ2K

SirYwell commented 2 months ago

It seems like the Reaching safepoint time is pretty high every now and then. It might be related to GC (I'm also seeing allocation stalls, that might indicate that memory just isn't sufficient). Does that also happen with either other GCs or more memory assigned?

The way spark takes thread dumps without async-profiler requires threads to be at a safepoint, but safepoint bias is more about less precise measurements than performance overhead/lag spikes.

pietro-lopes commented 2 months ago

Another person https://spark.lucko.me/gghY5nDptL (for spec references)

With spark (at this time didn't asked to use the gc debug option, only safepoint) https://mclo.gs/NX3UTPO

(nearly ~21s of pause only for ThreadDump, on an aplication running for 232s)

No spark https://mclo.gs/gETRTlg (now a total of ~2s of pause for app running for ~236s)

pietro-lopes commented 2 months ago

And now just another player had same issue and fixed by disabling background profiler. We will ship that config disabled by default for now.

lucko / spark