whosonfirst-data / whosonfirst-data

Who's On First is a gazetteer of places.
http://www.whosonfirst.org/
Other
411 stars 9 forks source link

Update administrative records in Spain #706

Closed dbauszus-glx closed 11 months ago

dbauszus-glx commented 7 years ago

Hi guys, I find your work a great help to everyone involved with mapping administrative boundaries but I am struggling to wrap my head around how the WOF data is related to the base OSM data.

So far I have used the old Mapzen Border bundles and updated OpenStreetMap directly when I encountered a problem. Am I right that changes to the OSM data source will not be reflected in the WOF data?

I am currently reviewing municipios in Spain. These boundaries should have an administrative_level 8.

I reviewed all the regions without a municipal code in the old Mapzen border admin level 8 geojson file for Spain.

Most of the time these areas are communal holdings or special areas such as 'Guipuscoa'. image

I will edit the object in the OSM data and demote the admin_level from 8 to 9. Only boundaries which are municipalities and as such have a municipality code should be on admin level 8. I notice however Guipuscoa is already removed from the localadmin level when I query with the spelunker tool.

Another problematic municpality is Medinya. image

This is a new municipality (June 2015) for which I do not know the municpality code yet. The municpality border in OSM is good. In the bundler dataset however the old border of neighbouring Sant Julia de Ramis persists. image

Then again other boundaries are fixed in the bundler export. The municipality of Lloret de Mar was wrong in the borders download and in OSM. I fixed the boundary in OSM this morning and notice that the bunder download has the fixed boundary for this municipality.

image

Is there any process in place to ensure that WOF boundary fixes will be migrated to OSM or will this always be a manual process.

A more serious problem is with l'Alqueria d'Asnar. I assume that the representation in OSM is correct. image

The object exists twice in the WOF dataset (bundler download).

Once as l'Alqeria d'Asnar and also as L'Alqueria d'Asnar (the main polygon to the West). Most worrying, the polygon in the South is not part of the municipality but should be part of image

This area should be a member of the municipality Alcocer de Planes further North. image

Please let me know how we can help to keep both WOF and OSM itself up to date.

D

dbauszus-glx commented 7 years ago

In the WOF bundler dataset are a couple of areas with the municipality code 31000. These seem to be national parks should not be listed with the municipality borders.

According to INE the municipality code 31000 does not exist.

The old borders dataset used to have the municipality code from OSM which is incorrect as well. I will now remove these codes from OSM as they are not official INE codes and the areas should not be listed as municipalities.

For example: image

Another of these 31000 areas doesn't seem to exist at all. I can not find reference to Faceria de Cogullo Alto which has been cut out of 'Villamayor de Monjardin'.

I believe the OSM boundary to be correct.

image

stepps00 commented 7 years ago

Hey @dbauszus-glx - thank you for the detailed issue!

First, it is important to note that the data in Who's On First is not sourced from OSM. Records in Who's On First are sourced from a variety of open data sources and may not reflect (as you've seen) what you'll see in OSM. And, after spot-checking the data in Spain, it looks like the majority of our administrative records are sourced from Quattroshapes.

If you are aware of any open administrative datasets for Spain that are available under a CC0 or CC-BY equivalent license, I'd be happy to work on updating administrative records in Spain. This is the best approach because we have inconsistencies at many admin levels in Spain.. thanks again!

nvkelso commented 7 years ago

@dbauszus-glx Mapzen discontinued the our OSM Borders project in October, 2016 as noted in the alert at the top of that page: https://mapzen.com/data/borders/. The data was updated one last time in October 2016, and hasn't been updated since. There were problems keeping the data fresh, and with limited staffing we're focusing on Who's On First.

It sounds like there are some exploded multi-part polygon pats that should be recombined. That's fairly easy to do as long as you mention which should be combined (or a new source that does that already).

By the way, I really like your Github avatar! :)

screen shot 2017-03-28 at 10 02 10
dbauszus-glx commented 7 years ago

Thanks for the detailed answer. I have now a much better understanding of the WOF dataset. I wasn't aware of the quatroshapes but will look into this at some point. We are in the process of building a census data product and use a range of open source borders. The problem is that we use OSM as basemap and the different open source boundaries don't match well at international boundaries and coastlines.

We have very limited resources ourselves and I refuse to write yet another API to get the boundaries from our database. The non-developers in our organisation have a poor understanding how much work it is to write an API in the first place.

Ideally I would like to get to a state where the boundaries in OSM are consistent and regularly checked. I am thinking about writing a tool which will get polygons through overpass and check these against the official codes. However this will require me to setup our own OSM overpass server first. And of course we need to make sure that the admin boundaries are complete in OSM to begin with. This is a huge task but we are thinking about paid internships for students to tackle one country each over the summer.

Myself, I have just completed Spain. The municipalities (admin_level=8) are now good in OSM. I just double checked and loaded the complete set through overpass and matched the polygons to the INE list as well as the aggregated census population count from 2011. Please feel free to check and update the WOF dataset if you like.

PS. I only checked mainland Spain and the Med Islands (not the Atlantic Islands).

<osm-script output="xml" timeout="999">
    <union>
        <query type="relation">
            <has-kv k="admin_level" v="8"/>
            <has-kv k="ine:municipio" modv="" v=""/>
            <bbox-query e="4.74021853245" n="44.2733978846" s="35.3246364453" w="-9.713828408"/>
        </query>
    </union>
    <union>
        <item/>
        <recurse type="down"/>
    </union>
    <print mode="body"/>
</osm-script>
dbauszus-glx commented 7 years ago

I just found one missing line element which I fixed. With that I had to do around 300 fixes to the admin boundaries for admin_level 8 in Spain. This took about 14 hours. So, yeah, it definitely is a huge job to get the OSM admin boundaries to be consistent enough for spatial analysis.

The avatar is from my last gig in Mexico. I tried to convince the powers to base their cadastre on OSM but failed. Sadly Mexico is somewhat behind in the adoption of Open Source anything.

stepps00 commented 7 years ago

@dbauszus-glx -thank you for the INE list, this will come in handy. I am going to review our administrative data in Spain and update exploded multi-part polygons that were incorrectly imported as separate features.

I would also like to updated geometries to reflect any administrative changes that have occurred since import. I found the https://datos.gob.es site with some administrative data, though the license seems to vary between datasets. What open dataset (aside from the INE list) did you use to update the geometries in OSM?

(Also changing the issue title..)

stepps00 commented 7 years ago

Overall, WOF is relatively accurate with administrative counts in Spain. Notes, per placetype:

Placetype WOF Source Note
Macroregion 20 19 404227403 is not valid, this area is part of the Basque macroregion
Region 56 50 Four "Conjurisdicciones interprovinciales" records, others merged
Localadmin 8,195 8,118 Combo of multi-part issues and merges at this level
Locality 4,657 N/A See Spain IGN
nvkelso commented 7 years ago

Spanish localities in Who’s On First are a Quatroshapes invention based on giving urbanized areas settlement names so it doesn’t surprise me there is gap here. The better thing to use is the Localadmin comparison.

On Wed, Mar 29, 2017 at 10:48 AM, Stephen Epps notifications@github.com wrote:

Overall, WOF is relatively accurate with administrative counts in Spain. Notes, per placetype: Placetype WOF Source Note Macroregion 20 19 404227403 is not valid, this area is part of the Basque macroregion Region 56 50 Four "Conjurisdicciones interprovinciales" records, others merged Localadmin 8,195 8,118 Combo of multi-part issues and merges at this level Locality 4,657 N/A See Spain IGN

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/whosonfirst-data/whosonfirst-data/issues/706#issuecomment-290168628, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0EO0uoWxs2OcVs1tn0R8DzHko9lcZJks5rqpl1gaJpZM4Mqluf .

dbauszus-glx commented 7 years ago

I did not have to draw any new boundaries. Most of the issues were name and code fixes. Admin level changes, such as demoting commonwealth land. A lot of broken outer boundaries. Some duplicates. Etc.

I mostly use the Eurostat communes boundaries.

I also have an old government dataset of unknown source which was given to me from a colleague.

I found the WOF dataset helpful as well.

nvkelso commented 11 months ago

Official data from https://centrodedescargas.cnig.es/CentroDescargas/catalogo.do?Serie=CAANE# only shows

We're updating the all the geometries, and taking care of a few merges, splits, and new features at the localadmin level.

nvkelso commented 11 months ago

Related cleanup of ESP: Ceuta and Melilla locality need placetype_alt as localadmin https://github.com/whosonfirst-data/whosonfirst-data/issues/2161

nvkelso commented 11 months ago

Resolved via https://github.com/whosonfirst-data/whosonfirst-data-admin-es/pull/37