octokit / octokit.net

A GitHub API client library for .NET
https://octokitnet.readthedocs.io/en/latest/
MIT License
2.67k stars 1.07k forks source link

Support Etags #352

Open haacked opened 10 years ago

haacked commented 10 years ago

GitHub API supports Etags! But Octokit.net does not.

We should support this! But it's a teensy bit tricky. If we only support this at the ApiConnection or Connection level, then using Octokit becomes kind of a pain for everyone.

Ideally, it's part of our top level api. But that poses a problem as well, do we just add an Etag property to the models? Then what do we do about cases like this?

IObservable<Repository> GetRepositories();

The ETag for that request (or set of requests) is for the list of repositories. Each individual repository won't have an etag.

Thoughts?

haacked commented 10 years ago

For reference, this is how Octokit.objc does it: https://github.com/octokit/octokit.objc/blob/master/OctoKit/OCTResponse.m#L22

Maybe our clients need to return response objects and not just the actual entities.

shiftkey commented 10 years ago

Maybe our clients need to return response objects and not just the actual entities

I think that's the best place to put this data. It kinda sucks, but a simple abstraction to expose the HTTP data is the easiest way to do it:

public interface IResponse<T>
{
    public string ETag { get; }
    // other fields
    public T Data { get; }
}

How this mixes with our Task<T>/IObservable<T> responses is probably something worth spiking - I think it's a rather significant change...

pmacn commented 10 years ago

I kinda like what they have going on over in octokit.rb for this, akin to exposing an IResponse GetLastResponse() method/property/whatnot either on the connection or the client.

Snipped from the https://github.com/octokit/octokit.rb readme

user      = Octokit.user 'andrewpthorp'
response  = Octokit.last_response
etag      = response.headers[:etag]
pmacn commented 10 years ago

Pretty sure this approach will still cause issues with the reactive client methods that go through GetAndFlattenAllPages since there's no simple way to know when there's a new last IResponse available.

Unless you expose an IObservable<IResponse> Responses { get; } on the reactive clients. Although initially that just sounds silly, I'll let it simmer.

haacked commented 10 years ago

Ok, I woke up at 5 AM with a bit of inspiration about this I want to run by you. :smile:

Let's take a step back and think about what the purpose of etags are in the first place. For the most part, they make caches more efficient in order to reduce bandwidth. They potentially reduce load on the server if the server has an efficient means to determine whether an entity has changed.

So why should a client to Octokit.net care? Well, it allows them to make conditional requests which don't count against their rate limit. But this is a lot of work for a client to manage properly.

What if we did it for them?

We could maintain an in-memory cache of requests and responses indexed by etag. So if you call GetRepositoriesForCurrent, we'd make a conditional request and if it's not modified, we'd return the set from the cache.

This in-memory cache could be replaced with other implementations. For example, you could inject a SQLite backed cache instead of an in-memory one.

We don't have to be a perfect cache to be effective. We could use the most recently used approach if memory consumption was crazy.

Alternatively, we could just automatically cache the etag (and not the response data) and just provide a method to make the conditional request test. This would allow clients to test if their cache is up to date.

pmacn commented 10 years ago

I didn't wake up at 5am! Mainly because I didn't get to bed until 1. But this pretty much covers what I was thinking as I was falling asleep. Just caching request/response pairs.

Although I was envisioning a setup where the client was simply given the option to turn on/off caching and then optionally change the "cache provider" (i.e. the in-memory, file system, SQLite alternatives) Also possibly make the cache modular in some way so that you could share a cache between client instances. Not sure if people are going to be constructing clients left and right.

haacked commented 10 years ago

@half-ogre any thoughts on this?

haacked commented 10 years ago

/cc @paulcbetts for your thoughts. Sounds like we might be able to plug-in Akavache.Http.

half-ogre commented 10 years ago

@Haacked I think there's a place for @shiftkey's idea of Response<T> regardless of etags, but I also like the idea of a cache provider for requests along the lines of what you describe. I think not caching at all is the best default behavior, but making it easy to "turn on" simple caching (like what you describe) or drop in your own cache provider would be pretty sweet.

nigel-sampson commented 10 years ago

I have a naive implementation of the discussed approach at https://github.com/nigel-sampson/octokit.caching

shiftkey commented 9 years ago

Fun fact: we currently have this IApiRespose type which captures the relevant fields to do this:

public interface IApiResponse<out T>
{
   T Body { get; }

  IResponse HttpResponse { get; }
}

public interface IResponse
{
    object Body { get; }

    IReadOnlyDictionary<string, string> Headers { get; }

    ApiInfo ApiInfo { get; }

    HttpStatusCode StatusCode { get; }

    string ContentType { get; }
}

The part that I don't think we've tackled is how to apply the ETag on subsequent requests. I like pushing this down to the HTTP layer, so perhaps a wrapper like Octokit.Caching is still the way to go.

Anyway, I'm leaning towards doing more HttpMessageHandler-based stuff through things like #808 - taking that to it's logical conclusion would likely support something like this...

darrelmiller commented 9 years ago

If you plug in Tavis.HttpCache it will transparently take advantage of Etags/LastModified assuming you allow the response to be privately cached. It will automatically convert a GET into a conditional if the cached response is stale and will serve the response from the cache if a 304 comes back.

mitchdenny commented 4 years ago

@shiftkey pointed me at this issue from #1636 where I added a big +1 to the issue of dealing with etags/last-modified. @haacked mentioned that folks want support for this for caching, but for me the other big one (and the reason I commented in the first place) was dealing with concurrency issues.

In my scenario I have GitHub spraying events at a webhook endpoint which dynamically scales and I have no real control over which endpoint gets the request. So the ability to query to see whether a change has occurred since I read an entity from GitHub allows me to introduce some serialization in an effort to enforce consistency (this is causing me some grief right now with check runs).

For caching I think the ApiConnection support that exists today is okay provided you don't have multiple threads rolling through the same GitHub client (I'm not sure if there are any protections for this already). But having something that is attached to the response or entities would be useful.

The way I was thinking about this API was that it would be done via the entities, so you might end up with something like this:

// We get a checkrun somehow - could be via an API call, or via a webhook.
var client = GetClientFromSomewhere();
var run = await client.Checks.Runs.Get(repositoryId, checkRunId);

var update = new CheckRunUpdate(run) {
  Status = new StringEnum<CheckStatus>(CheckStatus.InProgress)
}

await client.Checks.Run.Update(update);

Behind the scenes, all types that participate in concurrency control would implement an interface which the client can use when making the request to set the appropriate headers. The act of constructing CheckRunUpdate passing in the CheckRun instance is signalling that you want to participate in the currency checks.

When a request fails because of this an exception would be thrown so that the developer could react - but there would also be a non-exception version of the API, for example:

if (await client.Checks.Run.TryUpdate(update))
{
  // Compensate for concurrency issue.
}

Ideally you could use ETags instead of date time stamps for concurrency control as it provides stronger guarantees (making an assumption here about the GitHub backend) - but the interface that the entities implement would allow them to signal the kind of concurrency control they support.

martincostello commented 2 years ago

I came across this issue after reading @JamieMagee's great blog post Making the most of GitHub rate limits and seeing if there were any improvements I could make to my Octokit.NET usage to use less of my rate limits.

It turns out for my specific C# code usage there wasn't really anything worth the investment, particularly as it requires a bit of jiggery-pokery to pass the headers to the API requests once you grab the ETag from the ApiInfo.

At the moment it seems like to easily deal with ETags and 304 responses you have to drop down to a very low level, at which point the value provided by Octokit gets less and less as the easy abstractions aren't usable with the ETags.

First-class support of some sort for this in the future would be most welcome.