openbudgets / platform

Tracking issues related to the working around the OpenBudgets.eu platform (WP4).
GNU General Public License v3.0
1 stars 0 forks source link

"Codelist" like selection of organization while uploading data through OS Packager #32

Closed skarampatakis closed 7 years ago

skarampatakis commented 7 years ago

@pwalsh When uploading data through OS Packager at some point you choose the Continent, Country and City which the dataset refers to.

image

Continent and City already use a predefined codelist, but when it comes for the City, you can only use free text.

From my POV, we could improve this behavior by using "codelists" for cities as well.

The benefits are that:

  1. you eliminate type errors
  2. homogenize the way entities are defined
  3. exploit available open data about this entity from the LOD cloud (population, geo-information, multilingual labels to name a few)
  4. eliminate disambiguation errors https://en.wikipedia.org/wiki/Athens_(disambiguation) ( in Greece fortunately we have only one city named Athens but in USA??? I believe you can think an example from your country as well).

For data we have transformed using custom pipelines we have selected manually the entities to map the municipalities ie for municipality of Athens we chose as obeu-dimension:organization the resource http://el.dbpedia.org/resource/Δήμος_Αθηναίων. Then we are using entity information from the source to get labels, polygons, population etc.

We could use a similar approach using entities from appropriate datasets such as GeoNames, Wikidata, DBpedia or such, each one very valuable. It is the reason Linked Data are so valuable and as a Linked Data project I believe we should exploit such trivial things.

pwalsh commented 7 years ago

@skarampatakis we'd certainly be happy for a pull request on OS Packager that can use a filtered code list for cities as well, either by using the data already in cosmopolitan, extending it, or using an alternate source, especially if it is trivial, as you say.

However, first we'd need a spec/requirements for how this would work, and not only in terms of the interaction for the user, but also for how the data is serialised on the Data Package, and presumably for your use case, how that info is further processed towards the OBEU triple store.

Changes in this area would also need to be able to applied generically (meaning, any dataset that any person uploads to OpenSpending for any country), as that is the scope of use cases that the core OpenSpending platform solves.

skarampatakis commented 7 years ago

I see that there is already a cities endpoint

http://cosmopolitan.openspending.org/v1/cities/

but it returns an error.

If I understand correctly it already imports data from Geonames. It could be extended to support other Data sources as well.

On all these sources entities have unique identifiers. Ie in Geonames http://sws.geonames.org/264371/ is Athens. On the Data Package side there is the field cityCode which is now filled from the free text input. We could reuse these identifiers or cosmopolitan specific identifiers if you prefer, as values for the cityCode instead of a string.

eg for the case of using Geonames IDs that would be cityCode: "geodata:264371".

For the scenario of data uploading through OS Packager and then run the hook, this would be transformed to a single triple as obeu:dimension organization using the FDPtoRDF pipeline.

pwalsh commented 7 years ago

If I understand correctly it already imports data from Geonames. It could be extended to support other Data sources as well.

Yes, that is the idea.

pwalsh commented 7 years ago

At OKI, we've love this implemented, in Cosmopolitan, and in the Packager. In discussions with @skarampatakis he has indicated he would work on this functionality. Closing this issue as it is duplicated elsewhere.