Closed danfowler closed 7 years ago
I get the logic here and i think this is already possible.
However, I want to flag some bigger points from discussion with @danfowler
Perhaps we are already at this point but I'd then like to see a good bit of analysis and summary of the services we have looked at and a good sample of datasets e.g. at least one municipal and country geo where we have worked this through.
Otherwise my suggestion for right now would be:
@danfowler @rgrp
I don't like that the original post here ties the clarification on location
as a dimension to the webservice that we have, related to OpenSpending. It immediately frames the problem in terms of "how can we make FDP work with the OpenSpending web service", rather than "FDP has a location dimension referred to in the spec: how do we make a location dimension valuable/usable".
Let's not talk directly about the web service or frontend reconciliation. Those are not the issues, and obviously we are already doing that.
Let's talk about how a location dimension is valuable in the spec, and ensure the spec delivers. Otherwise, why do we have a location dimension in the first place.
This is a case in which we feel that the spec is not being descriptive enough to allow implementors to make use of the values stored in the location dimensions.
The solution here is to make spec more descriptive, very much like we not only provide means to indicate that a dimension is a classification but also which kind of classification.
I am very reluctant to using attribute names for anything, and in fact I would remove the suggestion and limitation for specific attribute names altogether. I think we should approach this in a similar manner to classifications. We should add an optional sub-categorisation for location dimensions allowing to specify a specific geo feature that the dimension describes (e.g. city / country / region / address) etc. This subcategory could be used by implementors as a hint for reconciliation mechanisms etc. [Actually, having this on the dimension level might be too restricting and we should probably add it as an attribute property - do be discussed]
In case we decide to adopt the OS types as first class citizens in the FDP spec, we should use that mechanism instead.
This thread, and frictionlessdata/specs#79, and our experience with OCDS location extension (lots of in-principle demand, but facing lots of chicken-and-egg challenges with understanding the granularity of modelling to aim for, and what different forms of analysis this might enable), and emerging work on Ag Investment data - I wonder if there might be scope to do some shared work on better understanding the different user stories, and structures of input data, for location?
For example, whilst schools and hospitals might have a easily described physical location, other analysis may want to know about the intended catchment area for those services, or about whether a service budget which has a broad geographical scope in general, is allocated only to a particular kind of sub-region. Asides from the question of gazetteers to use for location, being clear to publishers whether a location is a 'physical location' or a 'delivery area' etc. may be important.
I also wonder whether it is useful to encourage publishers to provide multiple levels of admin geography where they have it, as:
Whilst in theory the higher levels could be inferred from the lower level, in practice we know this is very difficult, and for the global analyst relies on having the chimera of a robust and updated gazetteer covering all levels of admin geography.
@timgdavies I totally agree that the spec needs to have better means to describe location data. As you said, just putting a 'location' column with no context is not good enough for more in-depth analyses.
So, besides encouraging publishers to provide more detail in the raw data, we should also make the spec flexible to support data sets with missing data. This is achievable in two ways I think:
constant
attributes or package metadata. This would remove the need to infer higher levels of location by.Are there any taxonomies which we can take inspiration from, and might fit this purpose?
@akariv my request here would still be more and more detailed user stories / walkthroughs so it is clear:
Currently, the specification of the
location
dimension is limited to the following suggested attributes:code
,title
, andcodeList
. This is useful if you have a dataset with a column containing well-formattedcode
s that you can look up in a well-definedcodeList
(e.g. one of these). But location information can come in many forms (e.g. city, state, latitude, longitude, etc.) and we might want to explicitly capture this information in the model (see https://github.com/openspending/fiscal-data-package/issues/79 for a discussion on addinglatitude
andlongitude
). We have a use case for modeling location information for a budget dataset that specifies region names like the following:OpenSpending is currently developing a web service that will accept location parameters (e.g. a country code and region name) and return ageojson
polygon. We would like to use information from the dataset above to lookup polygons for visualizing on a map.Currently, we are using a combination of the top-level
countryCode
(in this case "MD") and a single attribute in the location dimension to derive a location for each spending line. This is problematic for two reasons:countryCode
is a top-level element that applies to the entire dataset and, for this reason, can also be an array. It would be preferable to source all information for doing geographic visualization from one place that directly describes the spending line (that is, themodel
).location
dimension attribute so we know how to do a lookup more generically?As a first step in being more prescriptive with the
location
dimension, we need to have some standard, predictable place to putcountryCode
in the model. One quick way we can do this is to addcountryCode
as a suggested attribute to a location dimension. In cases where the countryCode is not listed in the dataset, we can provide it via aconstant
keyword. While we are doing this, we might as well add more fields to get some specificity in the location data we are mapping. I look toward OCDS for some potential fields:postalCode
countryName
streetAddress
region
locality
So that the location dimension could look like this:
What do you think @rgrp @pwalsh @akariv ?