palantir / atlasdb

Transactional Distributed Database Layer
https://palantir.github.io/atlasdb/
Apache License 2.0
55 stars 11 forks source link

Timelock Server sets the default heap size to 512MB irrespective of the host types used #2107

Closed hsaraogi closed 7 years ago

hsaraogi commented 7 years ago

The main purpose of the default params was to set min heap size to max heap size, but we also set an exact value for the heap sizes. Should be worth customizing on the number of cores available.

tpetracca commented 7 years ago

Is there more background internally here? My sense is that we should just target the default at whatever the default standard ec2 instance type we're running internally against is and publish that information publically as a recommendation.

schlosna commented 7 years ago

We need to be aware that increasing heap size increases pause time, and that will impact leader elections & Paxos voting rounds.

synical commented 7 years ago

Yep main concern here is to not have too big a heap so we aren't wasting CPU on GC

hsaraogi commented 7 years ago

What if we have very powerful boxes like c4.8xlarge for timelock? Is setting the heap to 512MB still optimal? We have tested previously on c4.xlarge and come to this number. I think this should be a function of the memory available on the host.

schlosna commented 7 years ago

The short answer is that it depends on several factors -- allocation rate, promotion rate, garbage collector (probably want CMS or G1 for timelock), request rate, response times, and cluster size. There are many knobs that we could adjust, but note that minor GCs will stop the world, so you're effectively trying to minimize the impact of those.

gsheasby commented 7 years ago

Just filed a PR over in the internal deployment module to make the heap size configurable. I don't believe there are any code changes needed on this side for that.

Do we want to do anything fancy like make it vary based on environment, or just leave it as configurable for now? Personally I'd do the latter - from our in-meeting discussion yesterday, it sounds like we're happy to have a fairly small heap as long as we can bump it if needed.

tpetracca commented 7 years ago

Glenn, I think we want to be highly prescriptive and make sure all of our defaults line-up for the best known configuration. We can allow for overriding those values for cases that Himangi describes above (like bumping the ec2 instance type up considerably), but want those to be the exception and not the rule.

gsheasby commented 7 years ago

Agreed that our defaults should line-up for the best known/usual config. The configurability is an orthogonal tool, intended for fixing the unusual case where we need to fiddle with the value.

Do we have conviction that 512MB is the wrong default? If so, do we want to change the default (and to what), or introduce the "vary based on environment" logic?

gsheasby commented 7 years ago

The default has been changed from 512MB to 4GB. If we find that this default doesn't line up with the best known config, we'll make further adjustments.