unhcr / dataviz-streamgraph-explorer

https://unhcr.github.io/dataviz-streamgraph-explorer/#types=1-2-3-4-5-6-7
BSD 3-Clause "New" or "Revised" License
16 stars 4 forks source link

Pull Data from CKAN API #6

Open curran opened 7 years ago

curran commented 7 years ago

Ideally we'd pull the data we need when we need it, so we don't need to download a 2MB file on page load.

curran commented 7 years ago

This requires some thought as to the granularity of the data requests.

The source streamgraph and destination streamgraph need different aggregations of the data, after filtering by selected population types. Does this mean that we'll need to make two separate API calls each time we change the population type filter selection?

matthewsmawfield commented 7 years ago

Yes ideally we should pull data from CKAN. I imagine for source/dest data, the table schema would look like:

year, country_dest_iso, country_origin_iso, population_type_id, total

The data is then more flat, meaning it can be queried easily and grouped by and summed in the SQL parameters of the CKAN API call. It would be interesting to compare the size of the flat file, using ids and ISO codes with 2MB proccessed data.json file. Most importantly though, its important from this stage to have the data.json as close to the structure of the future API JSON response. Currently the data.json stucutre is nested, and it would not be possible to return JSON the same as data.json from a CKAN endpoint.

curran commented 7 years ago

Yes, that schema looks accurate.

I think what's important is to have a place in the code that expects the JSON structure that will be returned from the API. Meaning, there can be an intermediate "unpacking" step that only applies to the packed JSON data, and returns a data structure that will match with the API response. But I think I get what you mean, that there needs to be a place where we can cleanly swap out the data source later. I'll work with that in mind.