kjetilk / p5-atteanx-query-cache

Experimental prefetching SPARQL query cacher, take 2
0 stars 1 forks source link

Add results of LDFTriples to the cache #29

Closed kjetilk closed 8 years ago

kjetilk commented 8 years ago

Once a query is executed, it may contain LDFTriple plans, that will cause the evaluator to download and process RDF, and this RDF should be cached in our own cache in case they can be reused.

RDF::LDF maintains its own cache of all the possibly paged fragments as HTTP::Response objects in a CHI cache (or will once LWP::UserAgent::CHICaching is used). This cache will be used by get_bindings, I suppose in the impl and substitute_impl methods: https://github.com/phochste/AtteanX-Store-LDF/blob/master/lib/AtteanX/Store/LDF/Plan/Triple.pm#L124

However, the caching of AtteanX::Query::Cache uses triple patterns as keys, and arrays or hashes for the cases where it is a single or two variables respectively. Therefore, I need to have some way to inserting the results into this cache in addition to the RDF::LDF cache. It should have as little as possible impact on the evaluation of the query, so it would be nice to do it as much as possible asynchronously, after all, the cache is useful for subsequent queries, which may take quite some time before they arrive.

My main way of doing things asynchronously is to use a Redis pubsub. So, my main idea is to just send off a quick pub with the triple pattern that needs to be processed, and then let the subscribing party look up in the RDF::LDF cache and insert that into the AtteanX::Query::Cache cache.

There are two main problems: 1) I'm not sure how to do it practically: Should I extend AtteanX::Store::LDF::Plan::Triple and wrap impl (and substitute_impl, not sure what the difference is)? Or perhaps hack get_bindings somehow? After all, it will iterate the data that is to be inserted in the AtteanX::Query::Cache cache. 2) Since it is asynchronous, it may happen that the subscriber starts to look up in the cache before the data has finished downloading. OTOH, it may not be a problem, since a call to RDF::LDF::get_fragments will result in a remote call, and that would be OK, I suppose.

Any ideas, @kasei ?