openspending / fiscal-data-package

MOVED TO https://github.com/frictionlessdata/specs/issues?q=is%3Aopen+is%3Aissue+label%3A%22Fiscal+Data+Package%22
24 stars 7 forks source link

On Location #140

Closed danfowler closed 7 years ago

danfowler commented 8 years ago

Currently, the specification of the location dimension is limited to the following suggested attributes: code, title, and codeList. This is useful if you have a dataset with a column containing well-formatted codes that you can look up in a well-defined codeList (e.g. one of these). But location information can come in many forms (e.g. city, state, latitude, longitude, etc.) and we might want to explicitly capture this information in the model (see https://github.com/openspending/fiscal-data-package/issues/79 for a discussion on adding latitude and longitude). We have a use case for modeling location information for a budget dataset that specifies region names like the following:

OpenSpending is currently developing a web service that will accept location parameters (e.g. a country code and region name) and return a geojson polygon. We would like to use information from the dataset above to lookup polygons for visualizing on a map.

Currently, we are using a combination of the top-level countryCode (in this case "MD") and a single attribute in the location dimension to derive a location for each spending line. This is problematic for two reasons:

As a first step in being more prescriptive with the location dimension, we need to have some standard, predictable place to put countryCode in the model. One quick way we can do this is to add countryCode as a suggested attribute to a location dimension. In cases where the countryCode is not listed in the dataset, we can provide it via a constant keyword. While we are doing this, we might as well add more fields to get some specificity in the location data we are mapping. I look toward OCDS for some potential fields:

So that the location dimension could look like this:

"location": {
  "dimensionType": "location",
  "attributes": {
    "region": {
      "source": "admin1"
    },
    "countryCode": {
      "constant": "MD"
    }
  }

What do you think @rgrp @pwalsh @akariv ?

rufuspollock commented 8 years ago

I get the logic here and i think this is already possible.

However, I want to flag some bigger points from discussion with @danfowler

Perhaps we are already at this point but I'd then like to see a good bit of analysis and summary of the services we have looked at and a good sample of datasets e.g. at least one municipal and country geo where we have worked this through.

Otherwise my suggestion for right now would be:

pwalsh commented 8 years ago

@danfowler @rgrp

I don't like that the original post here ties the clarification on location as a dimension to the webservice that we have, related to OpenSpending. It immediately frames the problem in terms of "how can we make FDP work with the OpenSpending web service", rather than "FDP has a location dimension referred to in the spec: how do we make a location dimension valuable/usable".

Let's not talk directly about the web service or frontend reconciliation. Those are not the issues, and obviously we are already doing that.

Let's talk about how a location dimension is valuable in the spec, and ensure the spec delivers. Otherwise, why do we have a location dimension in the first place.

akariv commented 8 years ago

This is a case in which we feel that the spec is not being descriptive enough to allow implementors to make use of the values stored in the location dimensions.

The solution here is to make spec more descriptive, very much like we not only provide means to indicate that a dimension is a classification but also which kind of classification.

I am very reluctant to using attribute names for anything, and in fact I would remove the suggestion and limitation for specific attribute names altogether. I think we should approach this in a similar manner to classifications. We should add an optional sub-categorisation for location dimensions allowing to specify a specific geo feature that the dimension describes (e.g. city / country / region / address) etc. This subcategory could be used by implementors as a hint for reconciliation mechanisms etc. [Actually, having this on the dimension level might be too restricting and we should probably add it as an attribute property - do be discussed]

In case we decide to adopt the OS types as first class citizens in the FDP spec, we should use that mechanism instead.

timgdavies commented 8 years ago

This thread, and frictionlessdata/specs#79, and our experience with OCDS location extension (lots of in-principle demand, but facing lots of chicken-and-egg challenges with understanding the granularity of modelling to aim for, and what different forms of analysis this might enable), and emerging work on Ag Investment data - I wonder if there might be scope to do some shared work on better understanding the different user stories, and structures of input data, for location?

For example, whilst schools and hospitals might have a easily described physical location, other analysis may want to know about the intended catchment area for those services, or about whether a service budget which has a broad geographical scope in general, is allocated only to a particular kind of sub-region. Asides from the question of gazetteers to use for location, being clear to publishers whether a location is a 'physical location' or a 'delivery area' etc. may be important.

I also wonder whether it is useful to encourage publishers to provide multiple levels of admin geography where they have it, as:

Whilst in theory the higher levels could be inferred from the lower level, in practice we know this is very difficult, and for the global analyst relies on having the chimera of a robust and updated gazetteer covering all levels of admin geography.

akariv commented 8 years ago

@timgdavies I totally agree that the spec needs to have better means to describe location data. As you said, just putting a 'location' column with no context is not good enough for more in-depth analyses.

So, besides encouraging publishers to provide more detail in the raw data, we should also make the spec flexible to support data sets with missing data. This is achievable in two ways I think:

Are there any taxonomies which we can take inspiration from, and might fit this purpose?

rufuspollock commented 8 years ago

@akariv my request here would still be more and more detailed user stories / walkthroughs so it is clear:

pwalsh commented 7 years ago

Moving to https://github.com/frictionlessdata/datapackage-fiscal/issues/3