pombreda / appscale

Automatically exported from code.google.com/p/appscale
0 stars 0 forks source link

Datastore Query Cache #180

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is a follow up to <a
href="http://code.google.com/p/appscale/issues/detail?id=177">Issue
177</a>. In addition to caching simple get/put operations, we can also
cache query requests using a generational caching strategy. It will allow
queries to be cached nicely and not require ugly expiration policies (e.g.
looping over keys). 

The basic idea is to keep a "generation" value for the entire table.
Similar to a version number, this generation number is changed any time an
object in the datastore is changed (i.e. put/delete). This generation
number is then embedded into the cache key for each individual query, so an
example cache key might be "Query/10/Select * from Users" where 10 is the
generation value. When no updates occur the generation value will not
change and as a result the queries can be successfully cached. When an
update occurs the generation value is changed, which will cause all newly
generated cache keys to change. Since the cache keys will be different it
will never hit old data, effectively expiring from the cache (without
explicitly deleting it). For more information on this concept see <a
href="http://assets.en.oreilly.com/1/event/27/Accelerate%20your%20Rails%20Site%2
0with%20Automatic%20Generation-based%20Action%20Caching%20Presentation%201.pdf">
this
presentation</a>

While the above presents the high level idea, we probably want to make a
few changes such that it is more fitting to this application. First, we
should maintain a generation value for each application instead of the
global table since different applications should not effect each other.
Also, we will need to generate a hash key for Query objects which should be
the same for queries which are the same. A simple approach is just md5
together all the fields of the Query object. I will use hash(query_object)
to refer to that method below.

Below is a sketch of what the code would look like:

# Gets the generation value for the current app, its stored in the cache as
well
def generation(app_name):
    return cache.get(generation_key(app_name), initial_value=1)

# Generates the generation key for the given application
# Each app has a different generation key so they don't effect each other
def generation_key(app_name):
    return "GENERATION/Query/%s" % (app_name)

# Generates a cache key for a query, should make sure this is less than 256
chars per memcached limit
def cache_key(app_name, generation, query_object):
    # Change the query_object into a string which is 
    return "Query/%s/%s/%s" % (app_name, generation, hash(query_object))

# Example put method, should be the same idea for delete
def put(key,value):
    datastore.put(key,value)
    # Increment the cache to "expire" old data
    cache.incr(generation_key(app_name), initial_value=1)

def query(query_object):
    key = cache_key(app_name, generation(app_name), query_object)
    value = cache.get(key)
    if value:
        return value
    # Missed, do that actual query
    result = datastore.query(query_object)
    cache.put(key, result)
    return result

Original issue reported on code.google.com by jmkupfer...@gmail.com on 7 Mar 2010 at 8:09

GoogleCodeExporter commented 9 years ago

Original comment by nlak...@gmail.com on 6 Sep 2011 at 9:27