[FEATURE] Flagd benchmarking

tcarrio commented 1 year ago

Requirements

I’ve heard memory usage cited for flagd previously in regards to the sidecar container pattern in the OpenFeature Operator. I would be interested in seeing a standard benchmark suite and docs around this.

Background

Adopters

For usage of the operator, one of the major cons listed in our research of OpenFeature was the memory usage per pod on the cluster and its potential impact on the cluster and our AWS bill. Regular benchmarking of the service would help understand what the expected resource usage of this would be means adopters can easily determine the impact.

Regression Analysis

As change requests are merged, increases in memory or CPU requirements could change. Standard benchmarks, preferably automated, would help to identify regressions associated with specific changes and releases.

Regression Example

HypotheticalPullRequest#892 increased memory usage by 20% and slowed processing of flags by 12%. This was flagged in CI and triaged. Upon further inspection, it was found that a struct was being copied to the heap on every execution of the new implementation.

Sidecar Impact Examples

Memory usage increase for the pods directly translate into resource pricing increases. This can be fairly negligible depending on the architecture of your applications. The table below demonstrates your pods provisioned resources in memory, along with a few columns indicating what the typical memory usage is for flagd.

Pod Memory	Flagd 1MB	Flagd 4MB	Flagd 16MB
16MB	6.25%	25%	100%
32MB	3.125%	12.5%	50%
64MB	1.5625%	6.25%	25%
128MB	0.78125%	3.125%	12.5%
256MB	0.390625%	1.5625%	6.25%
512MB	0.195312%	0.78125%	3.125%
1024MB	0.097656%	0.390625%	1.5625%
2048MB	0.048828%	0.195312%	0.78125%
4096MB	0.024414%	0.097656%	0.390625%

Disclaimer

:warning: These are examples, not based on any existing benchmark, so do not use this for actually estimating resource utilization. I am sampling pseudorandom resource usage as a way to statistically showcase the impact.

beeme1mr commented 1 year ago

There's is a basic benchmark running already on each PR as well as nightly.

Currently, it focuses on operations per second but it's a bit inconsistently because we're running on shared GitHub infrastructure. Perhaps we should expand the test to include memory usage.

Could you please clarify the sidecar impact example? I'm not sure what you're trying to convey with the table.

beeme1mr commented 1 year ago

Loosely relates to #192

beeme1mr commented 1 year ago

Test results are not available.

open-feature / flagd