Optimize checking gates

up2jj commented 7 months ago

We use actor-based gates pretty extensively during new feature rollup process and noticed significant slowdowns when flag contains large number of gates (>20k) and is checked multiple times.

Current approach relies on intermediary list created by Enum.filter/2. With new approach only reduce is used.

Benchmark below (5k gates):

--------------------------------------------------------------
$CACHE_ENABLED=true
$PERSISTENCE=ecto
$RDBMS=postgres
$PUBSUB_BROKER=phoenix_pubsub
$CI=
--------------------------------------------------------------
Elixir version:        1.16.2
Erlang/OTP version:    Erlang/OTP 26 [erts-14.0] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]
Logger level:          :error
Cache enabled:         true
Persistence adapter:   FunWithFlags.Store.Persistent.Ecto
RDBMS driver:          Ecto.Adapters.Postgres
Notifications adapter: FunWithFlags.Notifications.PhoenixPubSub
Anything using Redis:  false
--------------------------------------------------------------
Excluding tags: [:redis_pubsub, :redis_persistence]

...............................................................................................................................................Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.16.2
Erlang 26.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 18 s

Benchmarking new ...
Benchmarking old ...
Calculating statistics...
Formatting results...

Name                 ips        average  deviation         median         99th %
new        1.49 K      669.04 μs     ±1.22%      667.92 μs      692.98 μs
old        1.29 K      775.76 μs    ±12.13%      736.63 μs     1028.36 μs

Comparison:
new        1.49 K
old        1.29 K - 1.16x slower +106.73 μs

Memory usage statistics:

Name          Memory usage
new       468.84 KB
old       546.98 KB - 1.17x memory usage +78.14 KB

**All measurements for memory usage were the same**

tompave commented 7 months ago

Hey there, thank you for using the library and for the PR.

This seems a sensible change, thanks! I'll run some local benchmarks. In the meantime, do you think it would make sense to extend this change to the group gates?

As an aside, I'm glad to hear that the library scales to over 20k actor gates 😆, but in that case would it make sense to use a group gate instead? I doubt that you're managing 20k actors with the web UI, you must be doing so programmatically (maybe with a remote console). In that case, you could also flip some attribute on your users to make them part of a beta-testers group.

mentero commented 7 months ago

Hey, we often use fun_with_flags to coordinate feature rollout. We have data migration tasks that go over our actors one by one, do the data migration necessary for the feature to work, and if the migration is successful it enables the flag. After that, we usually go and create a boolean gate and start the code clean-up so we can remove it completely.

This usually worked fine, and we really enjoyed the fine-grained control over the release process, but in this particular case, the flag was heavily used inside a codebase, and we were taken by surprise by the accumulated performance hit of the cache lookup times as it was filled with more and more actor gates. This of course made a lot of sense when we discovered this was the issue and the hotfix was quite simple. We replaced all the actor gates with a single boolean gate (as the migration was done by that time).

We are working on a coldfix right now with a few ideas in mind such as moving the flag checks higher up the callstack or introducing yet another cache layer that will remember all the flags for the specified actor which should be a more suitable strategy for our case

up2jj commented 7 months ago

Hey there, thank you for using the library and for the PR.

This seems a sensible change, thanks! I'll run some local benchmarks. In the meantime, do you think it would make sense to extend this change to the group gates?

As an aside, I'm glad to hear that the library scales to over 20k actor gates 😆, but in that case would it make sense to use a group gate instead? I doubt that you're managing 20k actors with the web UI, you must be doing so programmatically (maybe with a remote console). In that case, you could also flip some attribute on your users to make them part of a beta-testers group.

@tompave taking into account that all these kinds of gates are stored in the same structure (list), large number of actors can affect checking of other gate types. I would like to take the same approach for groups, so will update the PR accordingly. Please let me know if there is anything else I can do

tompave commented 7 months ago

Hello, thank you for iterating on this.

I just wanted to align expectations on the timeline. This week I've been too busy to look at this, and I'll be travelling for the next few weeks. I'll look at this when I'm back, in May.

up2jj commented 6 months ago

Hello @tompave

I hope your trip went well 😄 Could I help you in any way with this issue?

tompave commented 3 months ago

Hey there, apologies for the long wait. I've tested this locally and it's good to be merged!

Thank you for identifying this area of improvement and for doing the work.

tompave commented 3 months ago

I've done a bit of commit surgery to remove some extra whitespace changes and get the PR to build with the latest CI config.

tompave / fun_with_flags

Optimize checking gates #172