propershark / timetable_cpp

Purveyor of schedule information for transit agencies via GTFS feeds and WAMP procedures.
1 stars 0 forks source link

Data versioning and idempotency #18

Open faultyserver opened 7 years ago

faultyserver commented 7 years ago

From a comment on propershark/proto#1:

Versioning in Timetable: clients should be able to cache. Timetable should (at some point) keep a monotonically increasing version number of sorts, based on the GTFS data it holds.

Since Timetable is an idempotent service (making the same request twice will yield the same results), clients can easily cache calls with identical parameters.

However, when Timetable updates its GTFS information, any cached calls should be immediately invalidated, and there is currently no way that clients can safely cache calls and know when the cache is invalidated.

The proposal here is to add a parameter to every response from Timetable indicating the version of the data that was used to generate the response with the following constraints:

How this version number will be best represented in responses is unclear to me. One option is to simply wrap responses in another Array and include the version number there:

timetable.visits_between(...) =>

[
  <version_number>,
  [
    ["20170402 06:50:00", ...],
    ...
  ]
]

Another option would be to use a map response instead. This has the benefit of showing semantics, but also takes up a sizable amount of space to include the key names:

timetable.visits_between(...) =>

{
  version: <version_number>,
  response: [
    ["20170402 06:50:00", ...],
    ...
  ]
}

I'm partial to using the map response, as it will also allow us to add arbitrary meta information later on without requiring clients to necessarily change their parsing logic.

elliottwilliams commented 7 years ago

The proposal here is to add a parameter to every response from Timetable indicating the version of the data that was used to generate the response with the following constraints:

When the underlying GTFS source changes in Timetable, this version number will change to a new value such that clients know to invalidate their caches.

What if we just send a Last-Modified date a la HTTP? That way, cache implementation which already store a last-updated date don't need to keep track of a version number. Or is there some other reason you'd prefer a monotonically increasing version number?

I'm partial to using the map response, as it will also allow us to add arbitrary meta information later on without requiring clients to necessarily change their parsing logic.

Agreed. This is what I'm used to seeing from RESTful APIs. If we care that much about reducing space, it's probably time to talk about protobuf.

faultyserver commented 7 years ago

I was debating just using an epoch timestamp as the version number, so I guess we could just call it last_modified or similar.

An issue I could see with conflating last_updated from the cache's perspective and last_modified from Timetable's perspective is if the timezones differ or if the clocks have drifted. It's unlikely, but unless the two are explicitly set by the same entity, I'd be concerned about possible invalidation due to those differences.

As always, we can just try it out and see what happens. I don't foresee it being an issue, anyway.