whitfin / cachex

A powerful caching library for Elixir with support for transactions, fallbacks and expirations
https://hexdocs.pm/cachex/
MIT License
1.54k stars 97 forks source link

Allow warmers to trigger hooks #322

Closed camilleryr closed 5 months ago

camilleryr commented 8 months ago

A) Sorry that this didnt start with an issue - feel free to close this and we can move any discussion over there, but I put together this PR as a POC to ensure that this solved our issue on our application side

B) The issue we were running into is that we wanted to use a Hook to react to when a Warmer ran and found that Warmers did not trigger hooks - this issue seems to be caused by the fact that warmers are initialized with a cache spec before the hooks are associated to their pids, so the notification filters these out thinking that they are not running.

This PR addresses that particular issue by having the warmer call put_many with just the cache name instead of the cache spec so that the current state will always be used to avoid any possibility of a race condition between the warmer running and the hooks starting and being added to the Overseer's cache state

Additionally, this PR rearranges the lifecycle of the warmer a bit to ensure that the first run will happen after any hooks have been started / attached so that notifications will not be missed

Please let me know if there are any questions / comments / changes you would like to see - and thank you for all of the work you do by maintaining this project

whitfin commented 8 months ago

Hey @camilleryr!

I've only glanced over this so far, I'll take a deeper look in future. Thank you for contributing! No worries about no corresponding issue, I don't really care; I just prefer that to save people doing unnecessary work.

My initial thoughts (again, with only a quick glance over):

This PR addresses that particular issue by having the warmer call put_many with just the cache name instead of the cache spec so that the current state will always be used to avoid any possibility of a race condition between the warmer running and the hooks starting and being added to the Overseer's cache state

Yep makes sense. I'm not entirely a fan of calling via the name, I'm wondering if we can do something like :provision that hooks have (but also I really hate that interface, so maybe not). Cache warmers are (in theory) called less frequently, so the overhead of using the name is very likely irrelevant in practice - even if the perfectionist in me would like to avoid it.

Additionally, this PR rearranges the lifecycle of the warmer a bit to ensure that the first run will happen after any hooks have been started / attached so that notifications will not be missed

I'm not sure I'm a fan of this, it feels a bit awkward and unnatural to me. I understand there are cases where you might want this, but you could also manually implement this waiting without having to change it for everyone (and thus add the maintenance burden). I guess I'm curious what your hook is doing; it feels like you implicitly know a warmer is running on startup so I'm curious why you'd have to be notified about it?

Please let me know if there are any questions / comments / changes you would like to see

Although I understand why you did it, I'm not a huge fan of create_hook in the test code - I'd rather have the modules be explicitly defined in the same test file rather than hiding them away. Debugging some of that common stuff is already a headache (particularly because it's async), so I prefer to have it all compact. I know this is just my personal preference though, so I hope you don't take this as a negative!

camilleryr commented 8 months ago

This PR addresses that particular issue by having the warmer call put_many with just the cache name instead of the cache spec so that the current state will always be used to avoid any possibility of a race condition between the warmer running and the hooks starting and being added to the Overseer's cache state

Yep makes sense. I'm not entirely a fan of calling via the name, I'm wondering if we can do something like :provision that hooks have (but also I really hate that interface, so maybe not). Cache warmers are (in theory) called less frequently, so the overhead of using the name is very likely irrelevant in practice - even if the perfectionist in me would like to avoid it.

I agree that this is a suboptimal solution, but I also though about the performance impact and had the same though about how frequently these are used! Im open to other approaches, but this seemed like the path of least resistance to ensuring that we would never call put_many with a stale state from the warmer

Additionally, this PR rearranges the lifecycle of the warmer a bit to ensure that the first run will happen after any hooks have been started / attached so that notifications will not be missed

I'm not sure I'm a fan of this, it feels a bit awkward and unnatural to me. I understand there are cases where you might want this, but you could also manually implement this waiting without having to change it for everyone (and thus add the maintenance burden). I guess I'm curious what your hook is doing; it feels like you implicitly know a warmer is running on startup so I'm curious why you'd have to be notified about it?

I am generally sensitive to the potential for race conditions - so when approaching how to allow warmers to trigger hooks I wanted to guarantee that the fix would work as expected every time. Without some coordination between the overseer and incubator there would be a potential for the warmer to run and have the put_many call happen before the Overseer.update call that occurs in Cachex.start_link which would result in the notification being missed. That being said - I don't think there is any tangible change for an existing user - the hooks and warmers still function the same way, but are just orchestrated in a way that enforces ordering. The additional waiting time would be only at startup and would be (I assume) negligible. The maintenance cost is definitely negotiable though!

As far as what we are trying to achieve - We have a work load that would rely heavily on this cached data, and ideally we would not start the workload until the warmer has ran. We could utilize the async: false attribute for the warmer for this, but that would disrupt the start up of our application and is generally not an approach that fits into our design. My initial though was that we could send a message from the warmer to kick off our work, but since we would be sending a message before the call to put_many has completed, we could end up starting our work before the cache is ready. These changes are designed to allow us to have a hook listening for the put_many notification to then send a message to our workload orchestrator that we are ready to start. (Once again, sensitive to the potential for race conditions)

Please let me know if there are any questions / comments / changes you would like to see

Although I understand why you did it, I'm not a huge fan of create_hook in the test code - I'd rather have the modules be explicitly defined in the same test file rather than hiding them away. Debugging some of that common stuff is already a headache (particularly because it's async), so I prefer to have it all compact. I know this is just my personal preference though, so I hope you don't take this as a negative!

Lol - I thought the same thing, but went with the helper to follow what I thought of as a convention here - I can definitely pull that out and replace it with a explicitly defined module

I understand if you dont think this warrants inclusion in cachex proper, but thought I would propose some changes since these pieces did not compose in the way I anticipated!

Thanks for the review and I am looking forward to more of your thoughts.

whitfin commented 8 months ago

Most of this makes sense to me, I'll give it another look over and see if anything else comes up. It seems like the required changes basically boil down to this (correct me if I'm wrong, of course):

  1. Hooks should be available prior to Warmers.
    • I think I consider it a bug that it doesn't already work this way?
    • It feels like any change here is a definite improvement.
  2. Warmers should use the latest version of cache state.
    • I definitely consider this one a bug!
    • There's no good reason to not use the name
    • The overhead is so small, it will only ever cause negative side effects (like this PR) by using the state

Then there's the third change regarding startup, but I'm not exactly sure what I think about this. Thinking out loud on this a bit more (disclaimer: kinda sick, and it's 1am, so feel free to tell me if I'm being stupid):

I see the dilemma with the current call to Overseer.update/2 being at the end of the chain so anything prior could race, but I'd rather do something to fix this cause rather than just sync the hooks/warmers stuff up specifically.

Technically Overseer.update/2 is sequential, so maybe we could have hooks self-register on startup instead - in theory then as long as they're started before the Warmers, it should all be up to date (right?). There is definitely a hit at cache startup here, but it's probably a lot more robust than linking it all after the fact. Note to self to check potential blocking cycles here, but I think it's safe.

I haven't really looked at how this would appear in the codebase, but rather than Informant.link/1 after the main tree is started, we'd just refresh from the Overseer before passing back. It sounds like it would look a lot cleaner, but it might not be in practice. Looking back at it, it's definitely weird l that there's some bookkeeping to be done after calling Supervisor.start_link/3 - it'd be nice if we could somehow address that here. Feel free to let me know your thoughts!

I've marked this for v3.7.0 which should be in the next couple of weeks (waiting for some follow up on a couple of other threads just to tie those up). Hopefully this timeline works for you? We can definitely improve a lot of this even if we don't fully merge everything as-is right now.

camilleryr commented 8 months ago

Then there's the third change regarding startup, but I'm not exactly sure what I think about this.

I think you are correct that its better to address the cause of the issue rather than to code around it! I have reimplemented this PR to try and have the Informant register the Hooks as they start up. To do this, I added a wrapper around the GenServer.start_link call that will capture the returned pid and call Overseer.update

This would not have been my preferred approach, as I think I would have put this in init rather than start_link - but that would require a bit of structural changes to facilitate, since the current init function is overridable, so we would need to create a GenHook type behaviour that delegates to the individual Hook implementation - its doable, but would require more code and I dont think there are really any down sides to where this is currently placed - let me know if you think there is a more suitable place to patch this in

Additionally - I pulled out that create_hook helper and implemented the module in the test suite - then found that there was a preexisting helper that not only created a hook - but created exactly the hook I needed, so I ended up reusing that - while this might use an abstraction that you eventually want to do away with - at least its a pre existing one!

Hope you feel better soon - and as always, let me know your thoughts on this and thank you for your time.

camilleryr commented 6 months ago

Wanted to check in and see if you had any additional thoughts on this work

whitfin commented 6 months ago

Hi @camilleryr! I was actually planning on looking at this today, what strange timing.

I can run through this now with my thoughts, then you can adapt as you see fit - then I'll probably get this merged in this weekend (maybe with some further changes from my side if necessary).

I'll leave a quick review, and if you could resolve the conflicts (which don't seem to make sense to me...) it'd be great! I should note I'm also happy to pick this up if your time is limited, just let me know!

camilleryr commented 6 months ago

I refactored this, hopefully its closer to what you were describing. With the exception of the updates to the stats tests, I think that this is the cleanest implementation yet - thanks for the guidance, and let me know what additional changes you would like to see.

whitfin commented 5 months ago

Looks good, thanks @camilleryr!

whitfin commented 5 months ago

This will be shipped in the next release, which is looking like it will be v4.0.0. Your work here has raised a lot of ugliness with the existing warmer implementation, so I'm going to take this opportunity to refactor all of it before release - so thank you once again for your time and patience with all of this!