Open lossyrob opened 4 years ago
@bitner in our discussion Monday we discussed a possible hitch around using geohash as feature IDs - I was unsure whether the requirement of the integer ID was on the tippecanoe side or with mapbox-gl. Unfortunately it's mapbox-gl-js, which makes fishnet cells not referable by their geohash on the front end for the setFeatureState
logic. However there is a FID that is unique inside each fishnet, so potentially if we don't plan on showing multiple fishnets we can just use that FID in a layer containing only that urban area's geom, where those FIDs would line up with the JSON data structure returned by the API.
Other than fishnets, it looks like there's WB_ADM*_CO for admin0, admin1 and admin2 that could be used to construct unique IDs for each feature that would be unique to whichever scope we need them to be (ie a globally unique integer ID could be constructed from those three values if needed).
How the front end works now, we're baking static tiles with individual layers for admin0 and admin1, so we'd need the feature IDs to be unique within those layers. Once we move to dynamically generated vector tiles, though, we wouldn't have that constraint - we'd be adding and removing layers per parent area (per admin0 area for showing admin1 boundaries, etc). And if we're getting data from the API in the above format, where the data is keyed by feature IDs from some API for a specific indicator layer and parent admin0/admin1 region, the front end wouldn't have a problem matching those keys up with whatever ID schema you'd like to implement.
For vector tiles, here's what we're planning on refactor to so that we can easily switch to the API if it has the same layout for accessing vector tiles:
All features have an "id", "population", and "name" property.
data/tiles/admin0/{z}/{x}/{y}
Contains all admin0 boundaries.
The Feature ID will be the constructed from the WB_ADM0_CO field in the WB supplied geometries.
data/tiles/${admin0}/admin1/{z}/{x}/{y}
where ${admin0} is the admin 0 ID of the parent nation
Contains all admin1 boundaries for the specified admin0 boundary.
The Feature ID will be the constructed from the WB_ADM1_CO field in the WB supplied geometries.
data/tiles/${admin0}/${admin1}/admin2/{z}/{x}/{y}
, where ${admin0} is the admin 0 ID of the parent nation and ${admin1} is the admin 1 ID of the parent admin 1 boundary.
Contains all admin2 boundaries for the specified admin1 boundary.
The Feature ID will be the constructed from the WB_ADM2_CO field in the WB supplied geometries.
data/tiles/${admin0}/urban/{z}/{x}/{y}
, where ${admin0} is the admin 0 ID of the parent nation.
Contains all urban boundaries for the specified admin0 boundary.
The Feature ID will be the ID on the WB supplied geometries.
data/tiles/${admin0}/${urban_id}/fishnet/{z}/{x}/{y}, where where ${admin0} is the admin 0 ID of the parent nation, ${urban_id} is the name of the parent urban area.
Contains all fishnet geometries for the specified urban boundary.
The Feature ID will be the FID on the WB supplied geometries.
We're planning on processing admin1, admin2, urban, and fishnet geometries for the following countries:
cc @pieschker
@Rob Emanuele rdemanuele@gmail.com for the vector tiles, we have everything combined into one table, so we don't need to nest as long as we have proper zoom limits set, so the endpoints I have right now are
/tiles/adm0/{z}/{x}/{y}.pbf /tiles/adm1/{z}/{x}/{y}.pbf /tiles/adm2/{z}/{x}/{y}.pbf /tiles/urban/{z}/{x}/{y}.pbf /tiles/urban_hd/{z}/{x}/{y}.pbf /tiles/urban_fishnets/{z}/{x}/{y}.pbf /tiles/urban_hd_fishnets/{z}/{x}/{y}.pbf
I have everything processed for all the countries. I can definitely add the nesting if necessary, but I'm not sure it adds anything for the vector tiles. My worry was more in how we pull in the attribute json files where I think we will want to use the higher level admin area to nest things.
On Wed, Jul 1, 2020 at 2:06 PM Rob Emanuele notifications@github.com wrote:
For vector tiles, here's what we're planning on refactor to so that we can easily switch to the API if it has the same layout for accessing vector tiles:
All features have an "id", "population", and "name" property. Admin0
data/tiles/admin0/{z}/{x}/{y}
Contains all admin0 boundaries.
The Feature ID will be the constructed from the WB_ADM0_CO field in the WB supplied geometries. Admin1
data/tiles/${admin0}/admin1/{z}/{x}/{y} where ${admin0} is the admin 0 ID of the parent nation
Contains all admin1 boundaries for the specified admin0 boundary.
The Feature ID will be the constructed from the WB_ADM1_CO field in the WB supplied geometries. Admin2
data/tiles/${admin0}/${admin1}/admin2/{z}/{x}/{y}, where ${admin0} is the admin 0 ID of the parent nation and ${admin1} is the admin 1 ID of the parent admin 1 boundary.
Contains all admin2 boundaries for the specified admin1 boundary.
The Feature ID will be the constructed from the WB_ADM2_CO field in the WB supplied geometries. Urban areas
data/tiles/${admin0}/urban/{z}/{x}/{y}, where ${admin0} is the admin 0 ID of the parent nation.
Contains all urban boundaries for the specified admin0 boundary.
The Feature ID will be the ID on the WB supplied geometries. Fishnets
data/tiles/${admin0}/${urban_id}/fishnet/{z}/{x}/{y}, where where ${admin0} is the admin 0 ID of the parent nation, ${urban_id} is the name of the parent urban area.
Contains all fishnet geometries for the specified urban boundary.
The Feature ID will be the FID on the WB supplied geometries. Areas we'll need data for
We're planning on processing admin1, admin2, urban, and fishnet geometries for the following countries:
- Brazil
- Indonesia
- Philippines
- Peru
cc @pieschker https://github.com/pieschker
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/worldbank/HNP/issues/3#issuecomment-652595206, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABIHXD5QGY4JZ62DGGLCFLRZOCJ5ANCNFSM4OLQST6Q .
--
David William Bitner dbSpatial LLC 612-578-9553
@bitner What feature IDs are you using for the features? I don't believe the urban_fishnets or urban datasets have unique feature IDs, which we need for the front end. Also we found that some of the feature IDs (wb codes) for admin1 or 2 were actually not globally unique. Is there a better unique ID structure than using the nesting?
@lossyrob I just checked and for adm0, adm1, and adm2 both the objectid and geohash columns in what was last delivered from @bpstewar are unique across all the records.
For the urban areas and the fishnets, only the geohash is globally unique.
As part of my ingest process, I also create an auto-incrementing int, so I could always do the join here between the spatial table and the incoming attributes and deliver that id in both the vector tile service and the attribute service.
@lossyrob It also looks like the data we are going to be serving from our API is much more static that I was anticipating at least for the time being. We are just going to be getting a single dump of measurements that we will be serving out, and not a time series of data with new data coming in. With that being said, I will probably just bake those measurements in as columns to the spatial data tables. The vector tile engine we are using allows for selecting the columns to include as attributes, so if I do things this way: /tiles/urban_fishnets/{z}/{x}/{y}.pbf would return all available measurements as attributes inside the vector tile
If you were to want a "slimmer" tile: /tiles/urban_fishnets/{z}/{x}/{y}.pbf?id,geohash,P10 would only return the id, the geohash and the P10 measurement
I know you said your client makes it really/fast to use the joined json / vector tile, but would a fully hydrated vector tile work for you as well? This would definitely be a more straight forward approach where we would not have to worry about any kind of heirarchical nesting. We would just want to put appropriate zoom level limits on each layer.
@bitner we can work with the collapsed hierarchy for vector tiles. It will make the transition from using static tiles to the API tiles a bit more tricky - the IDs you generate won't match up with our data IDs - but we can wait to transition the vector tiles for urban and fishnets until you have the data necessary in your API as well. We'll be able to transition to your vector tiles with admin0, admin1, and admin2 before all the data is in by using the ObjectID in the WB data (this includes IHME and JHU timeseries data, which we'll need to figure out how to get into the API down the road).
As far as baking values into vector tiles, it would be best for the front end to keep the data and the tiles separate. While the current data is static, my understanding is that there's desire to have time series data in the API down the road, and we already have a system for matching the data and geometries that will work for both cases. Also, as the number of layers grows in the future, it would be best to be able to grab each layer values and the color breaks (and any other layer-specific configuration) separately as the layer is selected by the user.
So if we got rid of the heirarchical nesting, our preference would be to have the tiles you had in your previous comment:
/tiles/adm0/{z}/{x}/{y}.pbf
/tiles/adm1/{z}/{x}/{y}.pbf
/tiles/adm2/{z}/{x}/{y}.pbf
/tiles/urban/{z}/{x}/{y}.pbf
/tiles/urban_hd/{z}/{x}/{y}.pbf
/tiles/urban_fishnets/{z}/{x}/{y}.pbf
/tiles/urban_hd_fishnets/{z}/{x}/{y}.pbf
along with endpoints that can grab the available layers, data and breaks for a specific boundary scope, wether that be encoded as query params or in the path, e.g.:
/layers/adm0
/data/adm0?layer_id='DHS_something'
/breaks/adm0?layer_id='DHS_something'
/layers/adm1?admin0={admin0_id}
/data/adm1?admin0={admin0_id}&layer_id='DHS_something'
/breaks/adm1?admin0={admin0_id}&layer_id='DHS_something'
etc
I'm also worried about having the availability of layers baked into the vector tiles - it's unclear if there's clear limits we would want to impost on the map zoom based on what geo level is selected - e.g. if you are drilled down into Brazil, it would be best to have flexibility in defining exactly what layers are available for that boundary (admin0 for Brazil admin1 for Brazil provinces) and decouple that from the visual display, so someone can zoom in and out without data changing up on them.
Is it possible to decouple the breaks, layers, data, and vector tile endpoints in this way? Haven't thought the 'breaks' and 'layers' part of this completely through yet, so happy to think through different ways we can satisfy that.
@lossyrob
Our understanding is that what we are receiving (and creating API's for) from WB will include and match this document: https://github.com/worldbank/HNP/blob/master/tables/RiskSchema.json where the scale attribute relates to which scale each indicator is available for. Can you use that document as the "catalog" that you are wanting from the layers/adm0 endpoint?
In the immediate term, we will not be processing any data that is not on that list.
Are you thinking the layer would be one call for each measurement (ie "P2") or the layer is the hnp_theme which would then include all of the indicators for that theme?
so /data/adm0?layer_id="Population" or /data/adm0?layer_id="P2"
What are you expecting for the breaks? Frankly, right now that kind of processing is not on our plate, but if there's something I can easily calculate within Postgres, I can definitely do so. We just don't have the bandwidth to go out of our way figuring that out.
If we do do the breaks, would it be better with the breaks if it was just nested in with the measurements data rather than needing to have a separate api?
Something akin to: Layer -- adm0, adm1, adm2 etc parent_id -- the id of the parent ie (adm0 id when the layer is adm1) feature_id -- id of the feature within the layer
other notes: we do not have names for either the urban areas or the fishnet cells, only adm0, adm1, adm2 we do not have population for any layers, although we expect to get the "Urban Population" as one of the measurements "P1" that we could bake into the tables.
{ "layer", "parent_id", "breaks": { "per_capita": { $metric: $breaks, }, "totals": { $metric: $breaks, }, } $features: [ { $feature_id: { "location_name": $location_name, "population" $population, "values": { $metric_name: { $value} .... }, } }, ... ]
Looks like there's a bunch of things to work out for integration - too many things to be able to work out in the short timeline!
My suggestion would be for us to focus on getting the admin0, admin1 and admin2 vector tiles used in the front end as a preliminary integration point. So if we could get endpoints that you listed:
/tiles/adm0/{z}/{x}/{y}.pbf
/tiles/adm1/{z}/{x}/{y}.pbf
/tiles/adm2/{z}/{x}/{y}.pbf
that have the Object IDs as the feature IDs, as well as a property that has the population supplied in the WB data, then it should be a good place for the front end to integrate.
For the population, we've been using the R10_sum
field on the shapefiles, and I'm pretty sure the R10_sum
column of the zonal_BASE.csv
files in the OMEGA/WB_BASEDATA
STATS datasets.
Thoughts on scoping integration to this subset as a first milestone?
@bitner checking in - any update on the vector tile endpoint? We've shifted the dashboard code to rely on geometries with the "WB * CODE" field for admin0-admin2, and I think are good to transition to your endpoints for admin0, admin1 and admin2 vector tiles if they're available. Thanks!
Hey Rob,
http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/
The following layers are available: adm0, adm1, adm2, hd_urban_fishnets, urban_areas, urban_areas_hd, urban_fishnets
If you just use the "demo" endpoint: http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/vector/demo/adm0/
You can click on a feature to see the columns that are available. Once we lock down which fields are needed, you can limit the attributes returned by the tile service by adding a ?columns=id,geohash parameter that takes a list of column names to include.
On Tue, Jul 7, 2020 at 10:53 AM Rob Emanuele notifications@github.com wrote:
@bitner https://github.com/bitner checking in - any update on the vector tile endpoint? We've shifted the dashboard code to rely on geometries with the "WB * CODE" field for admin0-admin2, and I think are good to transition to your endpoints for admin0, admin1 and admin2 vector tiles if they're available. Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/worldbank/HNP/issues/3#issuecomment-654957780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABIHXF3AYXWEQJPWTKQIQLR2NAIRANCNFSM4OLQST6Q .
--
David William Bitner dbSpatial LLC 612-578-9553
Great, we'll integrate this and let you know if we have questions. Thanks.
@bitner do the features of the vector tiles have an id
that is the wb code? Like I mentioned above, the front-end depends on that being available.
@lossyrob yes, those are there, but they are not globally unique for adm1 and adm2. The attributes are exactly what are present in the geojson files we received from @bpstewar. As per above for adm0, adm1, and adm2 a quick look seems to be the geohash and objectid fields are unique and only the geohash for hd_urban_fishnets, urban_areas, urban_areas_hd, urban_fishnets.
There is also the ogc_fid which is an id that I created as just a sequential int key when I loaded the data. I can add this to any attribute dumps that we create as well.
I was able to work around the feature ID by using promoteId. We can now use the geometries of the APIs admin0 - admin3 vector tiles! :tada:
@bitner I got adm0 tiles work, but I'm getting 500's visiting:
http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/vector/demo/adm1/
and seeing CORS errors when switching to adm1
e.g.:
Also hitting the adm1 table via http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/#/Tiles/tile_vector_tiles__table___z___x___y__pbf_get seems to give a 500.
Let me know if I'm using wrong URI or if there's a fix on your side.
Nope, looks like there's an issue on our side somewhere. I'll look into it.
On Tue, Jul 7, 2020 at 4:19 PM Rob Emanuele notifications@github.com wrote:
@bitner https://github.com/bitner I got adm0 tiles work, but I'm getting 500's visiting:
http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/vector/demo/adm1/
and seeing CORS errors when switching to adm1
e.g.: [image: Screen Shot 2020-07-07 at 5 15 36 PM] https://user-images.githubusercontent.com/2320142/86844255-cbbba380-c075-11ea-9287-fcff2b6f69eb.png
Also hitting the adm1 table via http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/#/Tiles/tile_vector_tiles__table___z___x___y__pbf_get seems to give a 500.
Let me know if I'm using wrong URI or if there's a fix on your side.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/worldbank/HNP/issues/3#issuecomment-655140801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABIHXCS2FGGAKTEDD4L2FTR2OGN7ANCNFSM4OLQST6Q .
--
David William Bitner dbSpatial LLC 612-578-9553
I need to kick our service so it picks up a configuration that changed that was making things bomb. Everything will go down for just a bit.
On Tue, Jul 7, 2020 at 4:41 PM David William Bitner bitner@dbspatial.com wrote:
Nope, looks like there's an issue on our side somewhere. I'll look into it.
On Tue, Jul 7, 2020 at 4:19 PM Rob Emanuele notifications@github.com wrote:
@bitner https://github.com/bitner I got adm0 tiles work, but I'm getting 500's visiting:
http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/vector/demo/adm1/
and seeing CORS errors when switching to adm1
e.g.: [image: Screen Shot 2020-07-07 at 5 15 36 PM] https://user-images.githubusercontent.com/2320142/86844255-cbbba380-c075-11ea-9287-fcff2b6f69eb.png
Also hitting the adm1 table via http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/#/Tiles/tile_vector_tiles__table___z___x___y__pbf_get seems to give a 500.
Let me know if I'm using wrong URI or if there's a fix on your side.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/worldbank/HNP/issues/3#issuecomment-655140801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABIHXCS2FGGAKTEDD4L2FTR2OGN7ANCNFSM4OLQST6Q .
--
David William Bitner dbSpatial LLC 612-578-9553
--
David William Bitner dbSpatial LLC 612-578-9553
Hey @lossyrob -- hitting some cloudformation snafus. We'll need to get things cleaned up and running tomorrow morning unfortunately.
On Tue, Jul 7, 2020 at 4:57 PM David William Bitner bitner@dbspatial.com wrote:
I need to kick our service so it picks up a configuration that changed that was making things bomb. Everything will go down for just a bit.
On Tue, Jul 7, 2020 at 4:41 PM David William Bitner bitner@dbspatial.com wrote:
Nope, looks like there's an issue on our side somewhere. I'll look into it.
On Tue, Jul 7, 2020 at 4:19 PM Rob Emanuele notifications@github.com wrote:
@bitner https://github.com/bitner I got adm0 tiles work, but I'm getting 500's visiting:
http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/vector/demo/adm1/
and seeing CORS errors when switching to adm1
e.g.: [image: Screen Shot 2020-07-07 at 5 15 36 PM] https://user-images.githubusercontent.com/2320142/86844255-cbbba380-c075-11ea-9287-fcff2b6f69eb.png
Also hitting the adm1 table via http://covid-publi-1v66das8fk57r-771481456.us-east-1.elb.amazonaws.com/#/Tiles/tile_vector_tiles__table___z___x___y__pbf_get seems to give a 500.
Let me know if I'm using wrong URI or if there's a fix on your side.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/worldbank/HNP/issues/3#issuecomment-655140801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABIHXCS2FGGAKTEDD4L2FTR2OGN7ANCNFSM4OLQST6Q .
--
David William Bitner dbSpatial LLC 612-578-9553
--
David William Bitner dbSpatial LLC 612-578-9553
--
David William Bitner dbSpatial LLC 612-578-9553
@bitner No worries, thanks for the update.
@lossyrob fixed all of our issues, we've added an environment for you to hit that we won't be doing active testing on that you should switch to using. http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/
@lossyrob @bpstewar has updated the geojson files, so now wb_adm1_co and wb_adm2_co are globally unique as an FYI
@lossyrob
I've added an initial api for the attributes.
Basic usage to get just the attributes for a single feature using either the geohash or the ogc_fid http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/vector/info/{layer}/{geohash or ogc_fid}
You can use limit the columns returned by selecting just a list of the columns with columns=wb_adm0_na,wb_adm1_na,geohash,ogc_fid
You can select the id to query by with keycol={ogc_fid, geohash, wb_adm0_co, wb_adm1_co, wb_adm2_co, objectid (pick one)}. If it is not a primary column for the layer you selected it will return multiple results.
When multiple results are returned you can use reportkey= to specify the key that is used as the index for the returned json object.
OpenAPI docs -> http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/#/Vector%20Tile%20API/feature_info_vector_info__table___id__get
Example to get all the adm1 attributes selected by the adm0 code: http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/vector/info/adm1/1?keycol=wb_adm0_co&reportkey=wb_adm1_co
Like I said, we can easily modify the interaction on the API, I just wanted to get something out the door for you to work with to get the ball rolling.
Additionally, when using the vector tile API, if you select the *_full layer (ie "adm0_full") ALL attributes are available to be returned as part of the vector tiles although I would definitely recommend using this option with the columns= option otherwise it's returning a LOT of data.
This issue describes the data utilized by the initial version of the HNP dashboard as well as the formats used. This should help inform the features of the API so that the dashboard can be transitioned to pulling data from the API.
Currently the dashboard pulls from static JSON files and vector tiles that are served through a Netlify deployment.
The vector tiles have two layers: admin0 and admin1. These are keyed by a unique integer ID - these are unique across both layers. The feature properties include total population data. admin0 and and admin1 boundaries and population data are from the GOST team's data (from the OneDrive data delivery). Geometries are supplemented with Natural Earth where geometries are not available. Currently we are only processing Brazil provinces for the admin1 layer. Future work will include a layer for urban areas.
The data we are using currently comes from the IHME COVID-19 projection dataset and JHU case data. IHME contains time-series projection data as well as summary information for some admin0 and admin1 boundaries. JHU has daily case, death, and recovery counts per admin0.
The data format for the time series is similar between JHU and IHME. Below is a description of the formats:
IHME time series data
where
IHME summary data
The summary data is the same as the timeseries data, without the values being split across dates. There are no levels implemented:
JHU data
This is similar to IHME time series data without the concept of levels. It includes the metrics "cases", "deaths", "active", and "recovered", where active is computed from cases and recovered.
Configuration data
In each case we need to know class breaks in order to style the polygons on the map. The "configuration" file for the datasets describes the dates that the time series data applies to (in order to populate the time slider) and the class breaks for both total and per-capita metrics. Currently only the per-capita data is used in the dashboard.
IHME Config
Where
JHU Config