whosonfirst / whosonfirst-www-api

4 stars 2 forks source link

Feature request: a batch version of mapzen.places.getHierarchiesByLatLon #99

Open simonw opened 6 years ago

simonw commented 6 years ago

Or just a general mechanism for batch API calls in general would be fantastic.

thisisaaronland commented 6 years ago

Tell me more?

simonw commented 6 years ago

We often need to resolve hierarchies for a bunch of places at once. For example... let's say we're returning a page with 10 events on it. Each event has a latitude/longitude point, and we want to show a breadcrumb on each event "card" showing the state, city and neighbourhood.

We do that by hitting our own internal service action which serves up an aggressively cached set of data derived from calls to getHierarchiesByLatLon. Provided those points have already been queried by our service, we'll be able to return the result direct from our cache.

BUT... what if we don't have the results cached yet? We need to make up to 10 individual calls to getHierarchiesByLatLon to pull back the data we need.

That's when we run into the mapzen 4-requests-per-second rate limit.

It would be fantastic if we could do something like this instead:

https://places.mapzen.com/v1/
    ?method=batch
    &api_key=mapzen-xxx
    &batch=URLENCODE({
        "1":{"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.777228,"longitude":-122.470779},
        "2":{"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.677228,"longitude":-122.470779},
        "3":{"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.577228,"longitude":-122.470779},
        "4":{"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.477228,"longitude":-122.470779}
    })

And get back a response something like this:

{
    "batch_results": {
        "1": {
            "hierarchies": [
                {
                    "neighbourhood_id": 85865919,
                    "continent_id": 102191575,
                    "macrohood_id": "1108830805",
                    "country_id": 85633793,
                    "locality_id": 85922583,
                    "county_id": 102087579,
                    "region_id": 85688637
                }
            ],
            "stat": "ok"
        },
        "2": {
            "hierarchies": [
                {
                    "neighbourhood_id": 85865919,
                    "continent_id": 102191575,
                    "macrohood_id": "1108830805",
                    "country_id": 85633793,
                    "locality_id": 85922583,
                    "county_id": 102087579,
                    "region_id": 85688637
                }
            ],
            "stat": "ok"
        },
        "3": {
            "hierarchies": [
                {
                    "neighbourhood_id": 85865919,
                    "continent_id": 102191575,
                    "macrohood_id": "1108830805",
                    "country_id": 85633793,
                    "locality_id": 85922583,
                    "county_id": 102087579,
                    "region_id": 85688637
                }
            ],
            "stat": "ok"
        },
        "4": {
            "hierarchies": [
                {
                    "neighbourhood_id": 85865919,
                    "continent_id": 102191575,
                    "macrohood_id": "1108830805",
                    "country_id": 85633793,
                    "locality_id": 85922583,
                    "county_id": 102087579,
                    "region_id": 85688637
                }
            ],
            "stat": "ok"
        }
    }
}

Doing this via a GET may not be the right thing (url-encoded JSON in a query string is ugly and long) - maybe a POST would be more sensible:

POST https://places.mapzen.com/v1/?method=batch&api_key=mapzen-xxx
{
    "1": {
        "method": "mapzen.places.getHierarchiesByLatLon",
        "latitude": 37.777228,
        "longitude": -122.470779
    },
    "2": {
        "method": "mapzen.places.getHierarchiesByLatLon",
        "latitude": 37.677228,
        "longitude": -122.470779
    },
    "3": {
        "method": "mapzen.places.getHierarchiesByLatLon",
        "latitude": 37.577228,
        "longitude": -122.470779
    },
    "4": {
        "method": "mapzen.places.getHierarchiesByLatLon",
        "latitude": 37.477228,
        "longitude": -122.470779
    }
}

Here's how we built this for Eventbrite's API: https://www.eventbrite.com/developer/v3/api_overview/batching/

There are all sorts of complexities around this - the need for sensible limits, how it interacts with rate-limiting etc - but being able to group requests in this way would be really useful.

thisisaaronland commented 6 years ago

As you mention, there are all sorts of complexities around this. I could imagine (in that way I can imagine all kinds of crazy stuff at the end of the day... :-) building a thin layer of icing... I mean a "service" on top of this:

https://github.com/whosonfirst/go-whosonfirst-api

Which would basically manage all the requests, whether they are executed concurrently or not, and take care of all the boring details (rate limiting, billing, etc.) behind the scenes.

I will have a closer look at the Eventbrite docs and start thinking about it more generally.

Do you imagine that you would want to mix and match API calls/methods inside a single batch request?

thisisaaronland commented 6 years ago

Okay, so this is incredibly wet paint but:

https://github.com/whosonfirst/go-whosonfirst-api-batch/blob/master/batch.go

As in:

./bin/wof-api-batch-server
2017/10/04 10:15:14 listening on localhost:8080
2017/10/04 10:15:18 TIMING 793.099943ms

And:

curl -s 'localhost:8080?api_key=mapzen-****' -d @batch.json | jq '.[].stat'
"ok"
"ok"
"ok"
"ok"

Where batch.json looks like this:

[
    {"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.777228,"longitude":-122.470779},
        {"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.677228,"longitude":-122.470779},
        {"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.577228,"longitude":-122.470779},
        {"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.477228,"longitude":-122.470779}
]

Question: Is there a particular reason your example batch request has numeric keys?

simonw commented 6 years ago

The numeric key thing was just one way to make it easy to keep track of "I asked these questions, I got these responses back again". Doing it as a list is entirely as good, it just means the client code that tries to remember which question it asked in order to get which response would work very slightly differently.

thisisaaronland commented 6 years ago

Another question(s):

simonw commented 6 years ago

I'd love the above as additions to a traditional request/response API, but not as a replacement for it.

A request/response batch API like the one described above would certainly need to be strict about how many batch requests are allowed. The neatest mechanism I've considered for this would be to assign each method a "cost", and allow a budget for a batch call.

For example, maybe mapzen.places.getHierarchiesByLatLon is assigned a cost of 5, and mapzen.places.getInfo has a cost of 1. If the batch API had a budget of 20, I would know that I could run 3 getHierarchiesByLatLon calls and 5 getInfo calls in a single batch request.

As a consumer of an API, I want to be confident that the API will return in a sensible amount of time - so having guidance that says "you can spend up to 20 credits in a batch call and we're confident we could return in <100ms" would be really useful.

An API that returns a ticket and asks me to poll for a result... that would be fantastic for big batch jobs. I have 80,000 venue locations I'd like to geocode right now - I'd love it if I could send you the whole lot in one go and then poll for a few minutes waiting for a giant response to be ready.

The WebSocket thing: I'll be honest, from regular Python (using the requests library) I think I'd just find it too fiddly to use. I'd have to drop in a Python websocket library instead. I'd do it if I had to, but given the choice between that and a polling-based API I'd take the polling one. I'm sure node.js developers would disagree with me wildly here :)

simonw commented 6 years ago

Huh, I just noticed that you already have a mapzen.places.getInfoMulti method: https://mapzen.com/documentation/places/methods/#mapzen.places.getInfoMulti

thisisaaronland commented 6 years ago

That's good to know and I tend to share your feelings. The WS stuff seems sufficiently fiddly and complex across languages that I can imagine it rapidly outstripping any potential benefits. I might implement a proof-of-concept endpoint but mostly as an experiment...