in_bucket? performs a superfluous database query

bryanwoods commented 11 years ago

user = User.last
user.in_bucket?(:my_test, 1)
#   (0.2ms)  SELECT MAX(`experiments`.`updated_at`) AS max_id FROM `experiments` 
#   (0.3ms)  SELECT MAX(`experiments`.`updated_at`) AS max_id FROM `experiments` 
#=> false

key88sf commented 11 years ago

@bryanwoods If you issue this function call a 2nd time, does it still make 2 queries?

bryanwoods commented 11 years ago

Yes -- I discovered this in application code that was calling the method twice, and thus making the query four times.

key88sf commented 11 years ago

@rmw I haven't looked at this yet, but any idea why it is doing this?

rmw commented 11 years ago

I'll look into this ...

rmw commented 11 years ago

Ok. I see it. It's happening in the Cache and should only be happening in dev or the first time the server checks the cache. Basically, the first time a server checks for an update the time will be calculated twice (once to get the last_updated_time and once to set the cached update time). After that it'll only happen once, because it'll be fetching the cached update time from the Cache. Maybe there's a better way to do that first check?

key88sf commented 11 years ago

@rmw What are the line #s for the two calculations? I only see the call to Cache.last_cached_update which calls Experiment.last_updated_at. But once that is called, won't the cache have that value from now on?

rmw commented 11 years ago

Calling get for the first time does:

if need_update?(last_cached_update) https://github.com/howaboutwe/experimental/blob/master/app/models/experimental/cache.rb#L14

Which will call last_cached_update, which sets the cache https://github.com/howaboutwe/experimental/blob/master/app/models/experimental/cache.rb#L59

Then back in need_update? we call last_update https://github.com/howaboutwe/experimental/blob/master/app/models/experimental/cache.rb#L36 which gets the date from the cache (except that the ttl is set to 10 seconds, so it's probably setting it again and running Experiment.last_updated_at again). https://github.com/howaboutwe/experimental/blob/master/app/models/experimental/cache.rb#L8

After this, it should get from the Cache itself.

key88sf commented 11 years ago

I think that's right, except the first time this is called there is no value set for the cache key (yes?), so the initial set of the cache with ttl=10 sec means that for the first 10 seconds, no value will be in the cache (it will use the original value of nil). So that's why the 2nd call to the cache results in another database query.

It seems like we should initially set the cache when initializing/loading, that way the first 10 seconds of usage doesn't result in lots of queries to the database. The sync process expires the cache (deletes the key) -- perhaps we can change this so that it updates the key instead?

rmw commented 11 years ago

How do you define "lots" of database queries?

key88sf commented 11 years ago

It depends on traffic. The code seems to imply that in the first 10 seconds after startup, all requests to experiments will result in database calls to find the max updated time. If our traffic is high (or spiking), this could degrade performance.

oggy commented 10 years ago

This is no longer happening (if you have cache_for set).

howaboutwe / experimental

in_bucket? performs a superfluous database query #6