ropensci / unconf17

Website for 2017 rOpenSci Unconf
http://unconf17.ropensci.org
64 stars 12 forks source link

HTTP API support code #85

Closed craigcitro closed 5 years ago

craigcitro commented 7 years ago

I spend a lot of time writing API wrappers, especially for Google (HTTP) APIs. This inevitably involves writing some of the same helpers repeatedly; I'm thinking specifically of things like:

  1. Retries: based on the response code, and possibly pattern matching on the error message, retry this request. We need to count total # of retries, and do things like use exponential backoff with jitter. Or, when present, use an appropriate bit of info returned by the server.

  2. Paging: Any API with a list method probably ends up getting a small helper that pages through results.

  3. Progress reporting: Especially for large uploads/downloads, any sort of progress indication is way better than nothing.

If this doesn't strike a chord with anyone else, maybe the right thing to do here is cook up some common bits between (say) googlesheets and bigrquery, and just have a common place for that code. That said, if these sorts of helpers sound useful for other APIs, it might be worth trying to package them alongside or on top of httr?

cc @hadley @jennybc for thoughts.

jennybc commented 7 years ago

Yes! Some of this is done here and there but would be nice to centralize, possibly in httr2? httr has retries now, but I think there are issues that suggest more flexibility would be appreciated. One of my favourite things about gh is that @gaborcsardi built page traversal in from the very start.

karthik commented 7 years ago

Love the idea. Would definitely love to participate if this isn't already being worked on elsewhere.

stephlocke commented 7 years ago

One of the biggest problems with APIs is expected schema. Additional data is often not considered a breaking (and therefore versionable) change. At the moment there's not an easy way to specify the schema you expect and drop additional data.

This makes it difficult to build assertive pipelines about the data or write unit tests against it. I've presently resorted to NULLing columns once an API result has been transformed into a data.frame but this is inelegant and consumes resource unnecessarily.

Building schema contract capability would be super handy!

@hrbrmstr, of course, has started a package which somewhat addresses this - swagger which uses a swagger definition to attempt to generate an initial package but I'd be interested in something in httr or maybe a json package that allows us to have a schema and parse json against it.

raymondben commented 7 years ago

Worth thinking about built-in caching support in this context? It's often useful to be able to cache the result of an API call locally so that the same code can later be run off line, or just to speed up/save bandwidth over slow connections. Users could always add their own caching via memoise/R.cache/mocker/httpcache/whatever but having it built in would be better.

jennybc commented 5 years ago

The package that ultimately came of this thread is on CRAN now:

https://cran.r-project.org/package=gargle

https://gargle.r-lib.org