Here I just implement the changes needed to the data manager. I anticipated this would be as simple as adding *args, **kwargs in a couple of places, but it turned out to be much more complicated than that. In short:
for memoization to work fully (so with def f(a=1): ..., we need f(a=1) and f(1) and f() to all have same cache key) flask-caching relies on inspecting the signature of the function being memoized
previously we memoized the _DynamicData.load bound method, but it is basically impossible for this to be done in a way that mirrors the signature of the underlying load_data function (details in https://github.com/GrahamDumpleton/wrapt/issues/263)
hence I've moved the memoization inside the load function itself
this means that the memoized function could be either a function or a bound method (e.g. in the important case of kedro datasets)
handling bound methods with flask-caching is tricky because it's difficult to set independent timeouts without lots of hacking
eventually I found the cleanest way to achieve this was to handle everything as a partial function named with the data source name. This ensures that both bound methods and functions are handled correctly, with caches keyed by data source name
small consequence of this: using partial functions now works so I've removed the error raised in DataManager.__setitem__
Also I've written many more tests to check this all works. These are a bit slow because they run for many different functions and include time.sleep. If they become annoying to run in CI we can tag them as slow and split them out or skip them or something.
To do in this PR
[x] @petar-qb to do manual test with redis 🙏 No need to test anything that's new functionality involving passing arguments, just to check I didn't break anything that worked before. I suggest you pick a few representative cases from https://github.com/mckinsey/vizro/pull/398#issuecomment-2049540014 and see what the cache keys look like since they'll be a bit different from before. Then please say if anything looks amiss
[ ] @petar-qb to do manual test with redis to check new functionality including passing arguments. Since you'll be familiar with what they look like now after testing this current PR, hopefully it'll be easy to check it all makes sense when we test in the following PR
[ ] add to changelog
To do separately
[x] Make new ticket to tidy log_call
Screenshot
Notice
[x] I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorized to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorized by the Apache 2.0 license does not violate the intellectual property rights of anyone else.
I have not referenced individuals, products or companies in any commits, directly or indirectly.
I have not added data or restricted code in any commits, directly or indirectly.
Description
This is step 1 out of 2 in completing https://github.com/McK-Internal/vizro-internal/issues/753/ to allow for parametrisation of dynamic data from the frontend.
Here I just implement the changes needed to the data manager. I anticipated this would be as simple as adding
*args, **kwargs
in a couple of places, but it turned out to be much more complicated than that. In short:def f(a=1): ...
, we needf(a=1)
andf(1)
andf()
to all have same cache key) flask-caching relies on inspecting the signature of the function being memoized_DynamicData.load
bound method, but it is basically impossible for this to be done in a way that mirrors the signature of the underlyingload_data
function (details in https://github.com/GrahamDumpleton/wrapt/issues/263)load
function itselfpartial
function named with the data source name. This ensures that both bound methods and functions are handled correctly, with caches keyed by data source namepartial
functions now works so I've removed the error raised inDataManager.__setitem__
Also I've written many more tests to check this all works. These are a bit slow because they run for many different functions and include
time.sleep
. If they become annoying to run in CI we can tag them as slow and split them out or skip them or something.To do in this PR
To do in following PR #482
To do separately
log_call
Screenshot
Notice
[x] I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":