API Definition - Githubissues

lukasmartinelli commented 7 years ago

Let's discuss the API endpoint here.

tyrasd commented 7 years ago

Eventually it would make sense to set up a separate repository for the api, but for now I think it's ok to discuss it here. :)

smit1678 commented 7 years ago

Taking a first stab at some notes based on the current frontend to keep this conversation going.

When a HOT project or AOI is selected the frontend shows (within that specific AOI):

number of HOT projects
number of contributors
experience of editors
number of buildings by date
km of roads by date

An API essentially already exists in the form of vector tiles, so a first take would look to extend this into a more RESTful API with endpoints that match the views you get within the OSMA frontend.

GET /stats - query by simple bounding box, return statistics GET /stats/country/portugal - within Portugal admin boundaries, returns stats GET /stats/project/2050 - within Tasking Manager project, return stats

To get temporal functionality:

GET /stats/year/day GET /stats/country/portugal/year/day GET /stats/project/2050/year/day

Is there a better way to do this? Anything missing here?

Are there needs for any CRUD operations yet? I don't think so but maybe.

lukasmartinelli commented 7 years ago

So how this could work implementation wise is we store all stats binned by the tiles in the vector tiles that are served by OSM Analytics.

Then to aggregate stats we fetch all vector tiles that are covered by the shape (you can use https://github.com/mapbox/tile-cover), fetch all those tiles and aggregate and return result.

This can be very fast because the backend has low latency to lookup these PBFs. Even if it has to lookup a few dozen.

smit1678 commented 7 years ago

@lukasmartinelli 💯 agreed. @tyrasd in line with your thinking and new improvements?

tyrasd commented 7 years ago

(sorry for the late reply, I've been at a conference last week)

Yes, the general approach looks fine.

Some implementation comments:

temporal functionality: that's a bit tricky, since the data we're working with (at the moment) doesn't have the full OSM history and so, any stats that's not for the current date is going to have systematic errors. (the timeline on osm-analytics is also only kind-of-ok, because we're explicitly only displaying the recency of [last] edits there) – I'd put a real historic stats feature on the wish-list for now until we actually have proper full history data.
Inside each osm-analytics vector tile there are mini-subtiles (the squares you see on the map), which should be used/skipped to further refine the region of interest.
The algorithm should select a suitable zoom level at which to request the raw vector tiles in order to find a good compromise between data accuracy (higher zoom is better) and amount of data to process (lower zoom is better). On the frontend, I chose the highest zoom level at which not more than 12 (or was it a different number?) tiles need to be loaded. On the API side, this threshold could probably be raised a bit (because data transfer is less of an issue there).
On the highest zoom level, instead of the aggregated data squares, the raw building/road geometries are stored. It is possible to calculate the same statistics also directly from those, but for ease of implementation it could make sense to just skip that data and use maxzoom-1 as the highest requestable one.

lukasmartinelli commented 7 years ago

So who is gonna move on implementing this?

We can assist along the way.

tyrasd commented 7 years ago

//cc @cgiovando

cgiovando commented 7 years ago

Our team at the World Bank is going through the hiring process as we speak and we should be able to have a developer/firm selected by mid-April who will be working on this.

mikelmaron commented 7 years ago

per chat w/ @smit1678, it may make sense to use this abstraction, but implement processing entirely browser side, in js library. cost of implementing and maintaining API on server may cause more problems than it solves.

tyrasd commented 7 years ago

One argument for having a server-side REST-API is that our data model might not yet be considered fully stable. E.g. if we want to include full-history data in the future, the data model needs to be changed and it will be necessary to change the algorithms calculating the stats from it. If we use a server side implementation, that's not a problem. But if we provide a library it will be harder to push necessary changes downstream to the data consumers.

mikelmaron commented 7 years ago

for that case -- just bump the library version @tyrasd?

tyrasd commented 7 years ago

Yeah, but until all data consumers have migrated to such a version 2, we'd need to continue to offer data-tiles compatible with v1 (which means a ~double consumption of resources for processing and hosting during that period).

Other plus-points for a server side API:

less processing demand on client (maybe relevant for mobile devices or old hardware)
less data to transfer to the client (maybe relevant for mobile devices or other slow internet connections)
integratable into software that is not written in javascript (e.g. a python script that regularly fetches stats for a monitored region)

If we would start by building it as a server side nodejs service, we'd still be able to eventually (once we're happy with the functionality and data scheme) release the respective code as a library for everyone who's interested in integrating it that way (and potentially enabling some more elaborate statistics that would be too resource hungry for the rest api server).

esasisa commented 7 years ago

Server side API is more feasible solution for this situation. It provide control on data access and security on back-end data. REST services will provide interoperable integration solution over data-tiles.

jenningsanderson commented 7 years ago

Another use case I just came across: Ability to identify / Visualize / export individual geometries and edits aggregated by changeset comment. Since changeset comments are used by OSM communities (like the tasking manager), this would be very helpful.

...Maybe a good way to go about this would be embedding the changeset comments into additional metadata at the tile level?

tyrasd commented 6 years ago

Closing since in the meantime, the osma-api: https://github.com/GFDRR/osm-analytics-api was created.

hotosm / osm-analytics-cruncher

API Definition #11