opennorth / represent-canada-data

Digital electoral boundary files for Canada, its provinces and municipalities
http://represent.opennorth.ca/
Other
46 stars 17 forks source link

Represent API: Data

Represent is the open database of Canadian elected officials and electoral districts. It provides a REST API to boundary, representative, and postcode resources.

This repository stores the digital boundary files for the database. The represent-canada repository is what's running at represent.opennorth.ca.

Layout

Boundary files are under boundaries/. Most are stored in a directory tree matching Open Civic Data Division Identifiers (OCD-ID) starting at boundaries/ocd-division/. Federal, provincial and territorial boundary files are further scoped by redistribution year.

A few boundary files exist outside the OCD-ID tree. Some, like ca_cd and ca_csd, are Census geography files whose OCD-ID would clash with Canada's. Others are the sources of multiple boundary sets in the API, each with a different OCD-ID.

License

Open North has permission to redistribute all shapefiles in this repository. Please read the overall license and the LICENSE.txt file in each directory to know your rights. In some cases, you will not have permission to redistribute the shapefile.

Completeness

Open North lacks permission to redistribute the shapefiles of some boundary sets in the API. Refer to the source_url, licence_url and data_url of those boundary sets to get copies of those shapefiles.

Provenance

All datasets are from government sources, with one exception: the postal codeOM dataset in the postcodes/fed directory is from Geocoder.ca. The definition.py files will have more details on sources and any modifications made to the files. Postal CodeOM is an official mark of Canada Post Corporation.

Maintenance

One-time setup

# Invoke must not be installed globally.
pip uninstall invoke

# Create a virtual environment.
mkvirtualenv representdata

# Install the requirements.
pip install -r requirements.txt flake8
npm install -g esri-dump

Regular tasks

For all the following commands, add --base=path/to/private/data to run them on the private repository.

Load the virtual environment:

pyenv activate representdata

List the available maintenance tasks:

invoke -l

Update the OCD-IDs:

curl -O https://raw.githubusercontent.com/opencivicdata/ocd-division-ids/master/identifiers/country-ca.csv

Maintain definition files

Make the code style consistent:

flake8

Check that all definition.py files are valid:

invoke definitions

Check that all data directories contain a LICENSE.txt (don't run on the private repository):

invoke licenses

Check that the source, data and license URLs work:

invoke urls

Find and correct the URLs in definition.py files. If you update a licence_url, you may need to update other occurrences in LICENSE.txt, constants.py and this master spreadsheet. Once all corrections are made, re-run definitions and urls.

If you update a data_url, update its shapefile, name_func and id_func following the instructions below.

Download shapefiles

After downloading shapefiles, but before committing, you must process the shapefiles, as described in the next step.

Check for old boundaries that may require manual updates:

invoke manual

Update a specific out-of-date shapefile. This task updates the last_updated date in the definition.py file:

invoke shapefiles --base=boundaries/ocd-division/country:ca/province:qc/2011

Or, update all out-of-date shapefiles. The output may contain additional instructions:

invoke shapefiles

Some shapefiles are online but require exceptional processing (invoke shapefiles will report Unrecognized extension). Remember to update last_updated in definition.py:

rm -f boundaries/ca_nb_wards/wards.*
esri-dump http://geonb.snb.ca/arcgis/rest/services/GeoNB_ENB_MunicipalElections/MapServer/1 > boundaries/ca_nb_wards/wards.geojson
ogr2ogr -f "ESRI Shapefile" boundaries/ca_nb_wards boundaries/ca_nb_wards/wards.geojson

After running these commands, you may have both untracked files and deleted files. This is due to sources changing filenames. If you git add the directory, the untracked files will be staged to be added and the deleted files will be staged to be removed.

After running these commands, you may have only modified the definition.py file, i.e. only the last_updated value is changed. That's also fine.

Quebec

After receiving a new boundary file for all municipalities in Quebec, you need to update the definition.py file in ca_qc_districts.

  1. Update the filename in ruby boundaries/ca_qc_districts/sets.rb
  2. Run ruby boundaries/ca_qc_districts/sets.rb
  3. Copy the output into the appropriate section of qc/districts/definition.py
  4. Separately define the boundaries of jurisdictions whose names duplicate others' (Plessisville (32045))
  5. Perform the other checks in the comments of the file

After loading the boundaries into Represent, check La Tuque and Sept-Îles in particular. Delete any boundary sets from Represent that are not current.

Process shapefiles

Get information about the shapefile, for example:

ogrinfo -al -geom=NO boundaries/ocd-division/country:ca/province:qc/2011

Determine the attribute for the feature's name and, if it exists, the attribute for the feature's public identifier.

For features that are numbered like "Ward 1", if there is no attribute for the numeric identifier, we can extract it from the name, like id_func=lambda f: re.sub(r'\D', '', f.get('WARD')). Similarly, if there is no attribute for the name, we can build it from the numeric identifier, like name_func=lambda f: 'Ward %s' % f.get('WARD').

For features that aren't numbered like "Ward 1", determining the public identifier may be tricky: the ID should be discoverable online; no two features should have the same ID; and OBJECTID is never the ID.

Read this section of the example definition.py file for help writing a name_func and id_func.

If you're updating many shapefiles, it may be long to run ogrinfo on each. Run:

../represent-canada/manage.py analyzeshapefiles -d . > manifest
git diff manifest

Once you've updated the definition.py files to correctly extract the feature's name and public identifier, you can commit the definition.py files and data files.

Cleanup

Fix file permissions:

invoke permissions

Check if the data request process spreadsheet is out-of-date:

invoke spreadsheet

Or less verbose:

invoke spreadsheet --base=. --private-base=../represent-canada-private-data

Postcode concordances

Each data directory under concordances/ has a README explaining how to source and update its concordances. If the concordances are more than a year old and can't be sourced, they should be removed. To do so, substitute the corresponding values in the above READMEs for <slug> and <source>:

fab alpheus update_concordances:args="<slug> <source> data/shapefiles/public/concordances/empty.csv"

Postcodes

Each data directory under postcodes/ has a README explaining how to source and update its postcodes.

Contact

Please use GitHub Issues for bug reports. You may also contact represent@opennorth.ca.

Acknowledgements

We would like to express our gratitude to Kent Mewhort at the Canadian Internet Policy and Public Interest Clinic (CIPPIC), whose legal research (PDF) made it possible for this repository to be made public.