opentraffic / datastore

OTv2: centralized ingest and aggregation of anonymous traffic data
GNU Lesser General Public License v3.0
28 stars 12 forks source link

produce a GeoJSON file that shows where in the world data coverage is available #40

Closed drewda closed 7 years ago

drewda commented 7 years ago

Goal: The analyst UI could use a way to show where in the world traffic stats are available for querying. (The POC UI did this by having a hard-coded drop-down select with the names of metro regions. That is no longer an option, since OTv2 is built as a single platform for global coverage.)

Either when Datastore creates histogram tile files (https://github.com/opentraffic/datastore/issues/30) or when it produces data extracts for public use (https://github.com/opentraffic/datastore/issues/36), let's consider generating a GeoJSON file that that includes coarse polygons around all the tile extents. This file could be placed on S3, alongside the tile files.

Perhaps this could be similar to how @kevinkreiser has Valhalla produce GeoJSON output to show multimodal transit tile coverage.

kevinkreiser commented 7 years ago

yes this could be similar to that process. currently its a bit of custom c++ code though and that doesnt really jive great with any of our systems. maybe a little python script would be in order

kevinkreiser commented 7 years ago

another issue is that of the additional dimension of time. because of that crawling through all of the data will take quite a bit. a better approach would be to roll new data into this "coverage" the other issue is of course that the coverage woudlnt have a notion of time in the geojson itself so thats another thing we need to consider how to expose that

louh commented 7 years ago

Could each polygon in the GeoJSON have start and end dates in the properties field?

kevinkreiser commented 7 years ago

yeah indeed, we can do that. but the question is more about how do you decide when to merge tiles into a polygon and what happens when there are gaps in the times. say you have a tile that has data coverage for 2016 and 2014 but not 2015. do we have still have one set of merged tiles that crosses 2015 but doesnt really have coverage there? also if we merge adjacent tiles (like we do with the transit coverage map) we basically will have the time information be the oldest observation as the start time and the newest as the end time. which, imho, woul make it seem like we have more coverage in the time dimension than we do. i guess we could make the time information in the properties more complex but then youd still have the problem of knowing which time periods covered which sections of the merged polygon! at any rate, this is non trivial to say the least. we're going to have to spend some time deciding whats in and out of scope and what tradeoffs we make.

louh commented 7 years ago

Thanks for the background @kevinkreiser!

Instead of using polygons of merged tiles, would it make any sense to have the geojson use tile boundaries instead so that we have date ranges on a per-tile basis? (Assuming for the moment that each tile didn't have gaps in data, which is a separate issue.)

kevinkreiser commented 7 years ago

yes that was my thinking to but sadly its still more complicated... you then have the question about what time values to put in the tile. just the oldest and newest date for any given tile seems reasonable, if there are gaps so be it? we also have the problem that we have 3 sets of tiles (covering differing geographic extents), 4x4 degree, 1x1 degree, and .25x.25 degree. we probably wouldnt want different coverage maps for local, arterial and highway would we?

louh commented 7 years ago

We could definitely make the argument that the approximation is really what would be the most helpful for the user. In any given map view / region, the start and end dates might be all we need. We can even provide a blanket warning that there may be gaps in coverage (geographically and temporally), but at least they would know from a glance that no data exists before a certain date or no data exists beyond a certain date.

kevinkreiser commented 7 years ago

yep agree! then it seems all we need to do is figure out what the heck to do with the 3 different tile sizes for the roadway hierarchy

louh commented 7 years ago

Can we provide all three layers? Depending on the size of the viewport or selected region, we can use the most appropriate tile size.

kevinkreiser commented 7 years ago

yes however there is an important difference between these and standard vector map tiles. the larger tiles only contain highways, the mid sized ones only contain mid size roads, and the smallest ones only contain the smallest roads. for the sake of coverage this approximation probably doesnt matter. the likelihood that you are zoomed in and dont have highway coverage but do have local coverage is probably insignificantly small.

louh commented 7 years ago

That's a good point, I'm glad you clarified the nature of those tile sizes. Maybe we can start with the assumption that the smallest tiles are a good proxy of data age at the other levels?

kevinkreiser commented 7 years ago

yeah i think thats the best bet at this point!

kevinkreiser commented 7 years ago

ok so i think one good way to do this is to have a coverage tileset. we can basically for every tile in the world (and at every level) have a little blob of geojson that will have the extents of that tile and the extents in time. we can either try to have the process that writes a derived tile product update the correct tile if its time range widens the bounds of time in the current tile (this requires locking), or we can have a monolithic thing that crawls all the derived data and creates these from that. then the client can just use the level one coverage tiles as an approximation for data at all levels.

drewda commented 7 years ago

(Can be run as part of #37)

drewda commented 7 years ago

An example: http://bl.ocks.org/anonymous/raw/a68063af4e5d177c72d081eced9ca47b/

gknisely commented 7 years ago

As soon as we get some more ref speed tiles, we can generate a larger map

drewda commented 7 years ago

Done. Job is run weekly.

Still needs to be consumed by Analyst UI: https://github.com/opentraffic/analyst-ui/issues/6