rqlite / helm-charts

Helm charts for rqlite
MIT License
9 stars 0 forks source link

Investigate autosetting GOMAXPROCS and GOMEMLIMIT env vars #27

Open jtackaberry opened 3 weeks ago

jtackaberry commented 3 weeks ago

See https://github.com/rqlite/rqlite/discussions/1804

otoolep commented 3 weeks ago

I haven't read the blog post, but I am not sure I would touch GOMAXPROCS. AFAIK it just does the right thing if not explicitly set:

https://pkg.go.dev/runtime#GOMAXPROCS

As for GOMEMLIMTI, well, tweaking the GC can be subtle.

rqlite can definitely OOM if one sends many large requests to it. I got some reports of OOMs recently. But the best way to deal with that is to lower GOGC and see if that helps (it did in the recent cases). See Go's GC guide for more details. I also added a note about all this recently to the rqlite docs

So not saying there isn't a case for doing some stuff automatically but, just like the JVM, tuning GC always looks simple to start, but can have unintended consequences.

jtackaberry commented 3 weeks ago

I haven't read the blog post, but I am not sure I would touch GOMAXPROCS. AFAIK it just does the right thing if not explicitly set:

Not in the context of cgroups limits. The blog is worth reading, but it also links out to https://github.com/golang/go/issues/33803. It looks like cgroups-aware behavior may land at some point in the future, and while it's not in place now, the theory involved looks sound.

As for GOMEMLIMTI, well, tweaking the GC can be subtle.

Consider a Go process running on a host with 64GB of RAM, 60GB of which is free, but a pod memory limit of 2GB. Go is going to act as though it's running on a system with 60G free, and it will not GC as aggressively as it would if it only had 2GB free, so it will keep humming along accumulating garbage until it's too late and the cgroups limit kicks in and the kernel OOM-kills it. Setting GOMEMLIMIT to the cgroups-imposed limit will induce GC earlier, which seems more desirable than summary execution.

rqlite can definitely OOM if one sends many large requests to it. I got some reports of OOMs recently. But the best way to deal with that is to lower GOGC and see if that helps (it did in the recent cases). See Go's GC guide for more details. I also added a note about all this recently to the rqlite docs

Dialing back GOGC isn't going to help when the runtime thinks it has 60GB of free memory. (Well, not unless you set it to like 2 :))

GOMEMLIMIT is reasonable. For the day job, we do set it in some cases for Go workloads in Kubernetes for this very reason. I've just never done it for GOMAXPROCS, but I'm not seeing any fault in the theory.

So not saying there isn't a case for doing some stuff automatically but, just like the JVM, tuning GC always looks simple to start, but can have unintended consequences.

Which is actually why I'm more inclined to change GOMEMLIMIT than GOGC. GOGC, it seems to me, can have more subtle and complex nuances than GOMEMLIMIT.

otoolep commented 3 weeks ago

That is interesting -- good to read, thanks. I'll keep an eye on this issue, see what you decide to do (if anything).