Query caching - Githubissues

bschilder commented 2 months ago

One thing to consider is adding query caching.

If the exact same query is run more than once within some time frame (or say within a single R session) it might be desirable to enable caching to speed up the same query when run subsequently.

For example, the first time this runs it takes ~8 seconds. If cached, it could be instantaneous to run it a second time.

monarch <- monarch_engine()

alz_diseases <- monarch |>
    fetch_nodes(query_ids = "MONDO:0004975") |>
    expand(predicates = "biolink:subclass_of", direction = "in", transitive = TRUE)

That said, we would need some way of tracking exactly how the query was constructed, and if anything was modified (inputs IDs, arguments, global options, etc.) we would need to automatically detect this and rerun the query from scratch.

Speaking of global options, if we do implement caching it would be good to have a way of turning it off globally (through setting options vars) or locally (wrapping some function within another function that forces a fresh query only for the code wrapped within the function, eg nocache({id="HP:00001"; fun1(id)})).

oneilsh commented 2 months ago

Good call - yeah the amount of time to cache is an open question. Perhaps that (and disabling cacheing) is something that can be set at the engine level via the preferences feature? I think I like the within a single R session option - safe and should be easy to implement. I could also see longer term cacheing (e.g. 2 weeks) with the obvious caveat of not fetching fresh data from the graph - which might cause an issue if we cache a query and then a later query pulls updated info that conflicts somehow.

I'm thinking the memoise package applied to the lower level Neo4j functions (cypher_query.neo4j_engine and cypher_query_df.neo4j_engine) would be the right place, with some logic for checking if caching is disabled.

bschilder commented 1 month ago

I'm thinking the memoise package applied to the lower level Neo4j functions (cypher_query.neo4j_engine and cypher_query_df.neo4j_engine) would be the right place, with some logic for checking if caching is disabled.

That makes sense to me, tying the caching closer to the neo4j queries is probably the best way to go.

monarch-initiative / monarchr

Query caching #37