googleapis / google-api-ruby-client

REST client for Google APIs
Apache License 2.0
2.81k stars 873 forks source link

Caching API calls using Rails low level cache #203

Closed rishighan closed 9 years ago

rishighan commented 9 years ago

I am using the google-api-client gem to connect to the Google Analytics API and request data from my analytics account for pageviews for a 30-day period.

I am authenticating and retrieving data just fine. The catch is making that many API calls, which slows my view down. So, I tried to cache the API calls using

result = Rails.cache.fetch('google_analytics_api/#{cache_key}', expires_in: 2.hours) { client.execute(:api_method => analytics.data.ga.get, :parameters => parameters) }

and I got a no _dump_data is defined for class OpenSSL::PKey::RSA

I would really like to cache my API calls or else the page I displaying the analytics data, loads really slow. Is caching possible using google-api-client gem?

Pertinent code: https://gist.github.com/rishighan/dc63d225e1ef63ce72e0

jeremywadsack commented 9 years ago

I would suggest making calls like that in a background process and storing the data, rather than on-the-fly. Even using the API Explorer I get >500ms response times so this would be a really slow response.

That said, have you tried just caching the data? Does your view need anything else from the Result object?

rishighan commented 9 years ago

I only need the results of the query. In my understanding though, I am caching just the result. Or am I not ?

If I were to run that in a background process, what should I be looking at in terms of implementation?

Are there any gems for that purpose? I am new to the concept of caching, hence the naïveté

Sent from my iPhone

On Mar 18, 2015, at 3:57 PM, Jeremy Wadsack notifications@github.com wrote:

I would suggest making calls like that in a background process and storing the data, rather than on-the-fly. Even using the API Explorer I get >500ms response times so this would be a really slow response.

That said, have you tried just caching the data? Does your view need anything else from the Result object?

— Reply to this email directly or view it on GitHub.

jeremywadsack commented 9 years ago

The Result object returned by execute contains a reference to the original request, the Faraday response, and the data. It may also retain references to the analyitcs API discovery object and your parameters. It clearly is storing a reference to some RSA key for authentication which you likely don't want to keep around. In any event that's a lot of things to store in the cache if you don't need them.

When you cache something it needs to convert it to a form that can be stored in the cache, which may require it to be serialized to write to disk (which is where your no _dump_data is defined message is coming from). Also, the cache is probably storing things in memory so you want to make that as small as possible – especially if you'll be caching a lot of different data sets.

Look at how you use the data in your view and determine if you can slim that cached data down. For example this would be a table of data with the first row being the column headers:

data = Rails.cache.fetch('google_analytics_api/#{cache_key}', expires_in: 2.hours) do
  result = client.execute(:api_method => analytics.data.ga.get, :parameters => parameters)
  [r.data.column_headers.map(&:name)] + r.data.rows
end

For background processing look at ActiveJob in Rails 4 which you'll still need to enable with a client library like Resque, Sidekick or DelayedJob. In Rails 3 you would just use one of those libraries directly.

rishighan commented 9 years ago

@jeremywadsack Thanks for that excellent response. I will definitely look into that approach to caching selectively. I am also going to experiment with sidekiq along with ActiveJob

I originally opened this issue thinking that this particular gem did not allow caching of the request. That is clearly not the case, so I am marking this as closed.

jeremywadsack commented 9 years ago

Well, I'm just showing you how to sidestep the gem. :)

Good luck.

p.s. as long as you are caching you might want to use the cache for the discovered API. It's suggested in the sample code (using a file), but would be dead simple using your Rails cache.

      analytics = Rails.cache.fetch(API_CACHE_KEY) do
        client.discovered_api('analytics', API_VERSION)
      end
rishighan commented 9 years ago

I ended up with this solution: interim = Rails.cache.fetch('google_analytics_api/#{cache_key}', expires_in: 2.minutes) do result = client.execute(:api_method => analytics.data.ga.get, :parameters => parameters) result.data.rows.map{|hit| hit[1].to_i}.join(', ') end

Since I am trying to get pageviews per blog post, I get the same result for every single blog post. What's the way to fix this?

jeremywadsack commented 9 years ago

What's your cache key? It needs to be distinct for each blog post so that the metrics don't overwrite each other.

As the parameters define the data, you could use them to make the key:

cache_key = parameters..map { |k,v| "#{k}:#{v}" }.join(':')
rishighan commented 9 years ago

I didn't know that. (karma--)

cache_key = parameters.map { |k,v| "#{k}:#{v}" }.join(':')
    interim = Rails.cache.fetch('google_analytics_api/#{cache_key}', expires_in: 2.minutes) do
       result = client.execute(:api_method => analytics.data.ga.get, :parameters => parameters)
       result.data.rows.map{|hit| hit[1].to_i}.join(', ')

    end

What happens is:

  1. On my home page, I click on a blog post.
  2. I have set up each single blog post page to display the pageviews for that particular post.
  3. With the code I listed above, I get the pageviews correctly for the first post.
  4. After that, if I go to any other blog post, the pageviews are the same for every other post.
jeremywadsack commented 9 years ago

The cache is just a name-value store. Like a file system if you save lots of things with the same filename you'll read back the last thing your wrote.

Not sure why that's happening without looking at the rest of your code. Are you sure that you are passing in the correct URL?

You could simplify things by making the cache key just the URL for the blog post rather than all the parameters. I just figured the list of parameters to that API distinctly identifies the data.

Steps I would try:

  1. Remove the cache temporarily and make sure each page gets it's own results.
  2. Use the debugger or logging to inspect the cache key on each request and make sure it represent the correct page.
  3. Log cache misses (within the fetch block) and see if it's hitting the Analytics API for each new blog post.
rishighan commented 9 years ago

Thanks @jeremywadsack I made a gist for the pertinent code: https://gist.github.com/rishighan/dc63d225e1ef63ce72e0

I have the credentials in a helper file and pageviews.rb has the method that makes the query request.

otty commented 3 days ago

So, this issue has been closed, but came up in the search machine. While reading this, i noticed the issue in your caching key:

irb(main):003> cache_key = "fubar"
=> "fubar"
irb(main):004> "google_analytics_api/#{cache_key}"
=> "google_analytics_api/fubar"
irb(main):005> 'google_analytics_api/#{cache_key}'
=> "google_analytics_api/\#{cache_key}"

So, asfaik you are using the string "google_analytics_api/\#{cache_key}" as the caching key, not "google_analytics_api/fubar" what you actually wanted to use.