piotrmurach / github

Ruby interface to GitHub API
https://piotrmurach.github.io/github
MIT License
1.15k stars 316 forks source link

Conditional requests? #112

Open lolmaus opened 11 years ago

lolmaus commented 11 years ago

Hi!

I'm using github_api to work with Gists. I'm building a tiny public online service, and the limitation of 5000 GitHub API requests per hour might one day become not enough (e. g. when someone shares a link on a very popular web portal).

GitHub suggests overcoming the limitation by caching requested files serverside and making conditional requests: GitHub will only parse a request if the requested Gist has been modified since you requested it for the previous time.

The question is: is it possible to do conditional requests with github_api?

If yes, please consider this ticket to be a request for documentation. If not, a feature request.

I'd also greatly appreciate links to example implementations, etc.

Thank you! ^_^

piotrmurach commented 11 years ago

Thanks for using the library! The simple answer is no, at the moment the gem does not provide conditional requests. This is a feature that I would like to add, do you have time to look into this?

lolmaus commented 11 years ago

I'm too novice for this yet. :(

But i'm willing to learn. So if you're ready to chat with me to explain the architecture of your gem and answer lots of newbie questions, i'll be happy to participate!

lolmaus commented 11 years ago

Hey Peter, i've looked into github_api source and i've got a couple of questions to discuss with you.

  1. A Conditional Request is based on either of two values — a date and an ETag (that looks similar to an md5 hash). GitHub API reports those values in the header of its HTTP response. I have inspected an object returned by github_api and found neither of the two. Does this mean that parsing the HTTP headers is yet to be implemented?
  2. Conditional Requests only make sense if you cache Gists and other stuff serverside. Do objects returned by github_api support serialization/deserialization or something?
lolmaus commented 11 years ago

Okay, i spent some time fiddling with github_api source and i found out that the object returned by it does contain etag and last-modified, yay! They are accessible like this:

@github.gists.get(params[:gist_id]).env[:response_headers][:etag]
@github.gists.get(params[:gist_id]).env[:response_headers][:"last-modified"]
lolmaus commented 11 years ago

Okay, all we need to make a conditional request is to merge {"If-None-Match" => '0bf5df817e504a1310b0bdd36b03f8cf'} from an argument of github_api's .gists.get() into the Faraday's headers hash.

I tried to follow the code from github_api's .gists.get() to Faraday's headers, but i got totally confiused with all those arguments, options, params, paginations and other stuff. Geez, this stuff is complicated!

Hey Peter, could you please help me with that?

The question of effective saving/loading of objects returned by github_api's .gists.get() is also actual.

piotrmurach commented 11 years ago

@lolmaus Thanks for spending time looking into this. The conditional requests as you rightly pointed out need to work hand in hand with response caching. Currently the gem does not support caching of any sort. I feel that users of the library should be able to add this functionality as optional extra and easily configure it. Once caching has been setup the conditional requests would be enabled by default. I will be working on this library in couple of days as there are other things that need adding/changing. I will definitely try to implement/work on the caching layer, however this may well not be a trivial task.

jimjh commented 11 years ago

@peter-murach Has there been progress on this issue? I am happy to help.

From my perspective, the application should be able to enable caching at two places:

  1. Inside the client. The gem could allow an additional option (config.cache = true) to turn on automatic request caching. If enabled, the gem will store the ETag given in the response and attach it in future requests. This 'memoization' is tricky - would a map of SHA256(faraday_args) -> response.etag be sufficient?
  2. Inside the application. Multi-process applications (say, production apps) may decide to persist the ETag outside of the gem. In this case, the gem should allow an optional argument (etag: '644b5b0155e6404a9cc4bd9d8b1ae730') and insert an additional If-None-Match header if provided.

Alternatively, we could just use faraday-cache, which appears to be used in octokit (see pull request).

jimjh commented 11 years ago

I tried adding faraday-http-cache as middleware using config.stack, but no luck so far.

piotrmurach commented 11 years ago

@jimjh quite slow to be frank, it would be awesome if you have time to help me out and do some prototyping. I find it much easier to discuss actual code implementation.

  1. Not sure about the option. If we go down the root of the catching middleware being enabled/disabled inside the gem itself then probably this solution would be good. However, allowing to specify externally cache middleware and the storage type that the user may want to use would clash with the option. For instance, if I want to store my responses as flat files then it would be good to provide for that flexibility. Then we could assume if someone configured caching layer it means they want to cache. When it comes to ETags retrieval, i think we should make sure that we hash the get request including the query params. What do you think?
  2. Based on the get request hash the conditional request would get fired with If-None-Match header for revalidation. If we get 304 then we serve the cached copy otherwise we update etag and content and serve that. Is that what you had in mind?

I think initially it would be cool to just flesh out a simple working prototype.

jimjh commented 11 years ago

I managed to get caching to work with the faraday-http-cache gem. Please refer to this gist.

Since it uses ActiveSupport::Cache, it supports external caches such as FileStore and MemCacheStore. If you are happy with this solution, all we need to do is update the documentation.

Note that in the case of a 304, the http cache middleware will simply return the cached status i.e. the rest of the Faraday stack won't know that it is a 304. Not sure if this will break anything.

piotrmurach commented 11 years ago

@jimjh This is really good! I like the way we can have external dependency that allows for easy store configuration. I think the ActiveSupport::Cache is good but not good enough. I'd suggest to see if this could work with moneta that has a huge range of storage options more suited for the json&media type of storage. I think this should not be a problem to integrate (see moneta). Any thoughts?

Currently the 304 I think would raise faraday error but we could add specific 304 error just for this case that people could catch. Eventually, add logic that checks if http cache middleware is present and attempt to retrieve response content.

jimjh commented 11 years ago

@peter-murach I am not too familiar with moneta, but it certainly looks powerful. Since MonetaStore implements the ActiveSupport::Cache::Store interface, I managed to use it as a drop-in replacement for the memory store. Check out this new gist.

I guess if there is demand in the future we could somehow indicate that it was a cached response (such as raising a faraday error like you mentioned) For now, the application could forcefully clear the cache store (with @store.clear) if it is really concerned.

maikokuppe commented 7 years ago

Is this currently possible?