Closed up2jj closed 3 months ago
Hey there, thank you for using the library and for the PR.
This seems a sensible change, thanks! I'll run some local benchmarks. In the meantime, do you think it would make sense to extend this change to the group gates?
As an aside, I'm glad to hear that the library scales to over 20k actor gates 😆, but in that case would it make sense to use a group gate instead? I doubt that you're managing 20k actors with the web UI, you must be doing so programmatically (maybe with a remote console). In that case, you could also flip some attribute on your users to make them part of a beta-testers group.
Hey, we often use fun_with_flags
to coordinate feature rollout. We have data migration tasks that go over our actors one by one, do the data migration necessary for the feature to work, and if the migration is successful it enables the flag.
After that, we usually go and create a boolean gate and start the code clean-up so we can remove it completely.
This usually worked fine, and we really enjoyed the fine-grained control over the release process, but in this particular case, the flag was heavily used inside a codebase, and we were taken by surprise by the accumulated performance hit of the cache lookup times as it was filled with more and more actor gates. This of course made a lot of sense when we discovered this was the issue and the hotfix was quite simple. We replaced all the actor gates with a single boolean gate (as the migration was done by that time).
We are working on a coldfix right now with a few ideas in mind such as moving the flag checks higher up the callstack or introducing yet another cache layer that will remember all the flags for the specified actor which should be a more suitable strategy for our case
Hey there, thank you for using the library and for the PR.
This seems a sensible change, thanks! I'll run some local benchmarks. In the meantime, do you think it would make sense to extend this change to the group gates?
As an aside, I'm glad to hear that the library scales to over 20k actor gates 😆, but in that case would it make sense to use a group gate instead? I doubt that you're managing 20k actors with the web UI, you must be doing so programmatically (maybe with a remote console). In that case, you could also flip some attribute on your users to make them part of a beta-testers group.
@tompave taking into account that all these kinds of gates are stored in the same structure (list), large number of actors can affect checking of other gate types. I would like to take the same approach for groups, so will update the PR accordingly. Please let me know if there is anything else I can do
Hello, thank you for iterating on this.
I just wanted to align expectations on the timeline. This week I've been too busy to look at this, and I'll be travelling for the next few weeks. I'll look at this when I'm back, in May.
Hello @tompave
I hope your trip went well 😄 Could I help you in any way with this issue?
Hey there, apologies for the long wait. I've tested this locally and it's good to be merged!
Thank you for identifying this area of improvement and for doing the work.
I've done a bit of commit surgery to remove some extra whitespace changes and get the PR to build with the latest CI config.
We use actor-based gates pretty extensively during new feature rollup process and noticed significant slowdowns when flag contains large number of gates (>20k) and is checked multiple times.
Current approach relies on intermediary list created by
Enum.filter/2
. With new approach only reduce is used.Benchmark below (5k gates):