Open pietro-lopes opened 2 months ago
It seems like the Reaching safepoint
time is pretty high every now and then. It might be related to GC (I'm also seeing allocation stalls, that might indicate that memory just isn't sufficient). Does that also happen with either other GCs or more memory assigned?
The way spark takes thread dumps without async-profiler requires threads to be at a safepoint, but safepoint bias is more about less precise measurements than performance overhead/lag spikes.
Another person https://spark.lucko.me/gghY5nDptL (for spec references)
With spark (at this time didn't asked to use the gc debug option, only safepoint) https://mclo.gs/NX3UTPO
(nearly ~21s of pause only for ThreadDump, on an aplication running for 232s)
No spark https://mclo.gs/gETRTlg (now a total of ~2s of pause for app running for ~236s)
And now just another player had same issue and fixed by disabling background profiler. We will ship that config disabled by default for now.
Description
Some players at ATM10 are having some lag spikes and I asked them to turn on safepoint and GC logs to see what is going on and turn out this is happening:
With spark https://mclo.gs/q9v2q3W
Without spark https://mclo.gs/XpTnbfu
Reproduction Steps
Happens just by having spark (maybe the background profiler?)
Looks like it is happening to very few people, I can't reproduce it at Linux (PopOS) or Windows 10.
Expected Behaviour
Don't know, is it suffering from safepoint bias (at least for Windows)?
Platform Information
Spark Version
1.10.97
Logs and Configs
No response
Extra Details
here is some random spark from that player if you need to grab some PC/config specs https://spark.lucko.me/fPQnwEqJ2K