healthysustainablecities / global-indicators

An open-source tool for calculating and reporting spatial indicators for healthy, sustainable cities worldwide using open or custom data.
MIT License
86 stars 35 forks source link

Process Issues #105

Closed nicholas-david closed 3 years ago

nicholas-david commented 3 years ago

@carlhiggs

At the moment trying to run the process from start to finish and running into a few kinks.

  1. When running the setup_config script, I receive the following error:

File "setup_config.py", line 64, in exec(open('./data/GTFS/gtfs_config.py').read()) FileNotFoundError: [Errno 2] No such file or directory: './data/GTFS/gtfs_config.py'

  1. I had the config files saved somewhere else on my computer so I tried to continue to the next step. I ran python sp.py adelaide and got the following error:

Failed to read configuration file /home/jovyan/work/process/configuration/adelaide.py.

This code assumes the name of a known city to be passed as an argument, however none was provided.

Configuration python files containing the dictionaries 'config' and 'parameters' are written
to the ./configuration directory for cities through use of the set up configuration script setup_config.py,
like so: 
python setup_config.py auckland

or, to generate set up scripts for all cities
python setup_config.py

[Errno 2] No such file or directory: '/home/jovyan/work/process/configuration/adelaide.py' Traceback (most recent call last): File "sp.py", line 58, in print(f"\nGlobal indicators project {today}\n\nProcess city: {config['study_region'].title()}\n") NameError: name 'config' is not defined

So, I changed line 50 to .json instead of .py and switched configruation_file (a defined term that seemed to be the same thing as what config used to be and is not mentioned in the rest of the code) to config in lines 50, 52, and 54. Now, I get the error

Traceback (most recent call last): File "sp.py", line 58, in print(f"\nGlobal indicators project {today}\n\nProcess city: {config['study_region'].title()}\n") TypeError: string indices must be integers

nicholas-david commented 3 years ago

Do you know what the issue is?

carlhiggs commented 3 years ago

Hi @nicholas-david, regarding your points

1) it's true that the file ./process/data/GTFS/gtfs_config.py is currently coded as a dependency in setup_config.py. The file is currently retrievable from the input data folder for GTFS on CloudStor; however, I just added this file into a commit to the global indicator master branch. Of course, the GTFS input data will still need to be retrieved. An optional approach could be to introduce a check that if the gtfs_config.py does not exist, then all GTFS analysis are set to be skipped; that's something to be considered for later - for now, if you have that file, the setup script should run.

So if you do a git pull for the master branch, maybe try again?

When run successfully it should look like:

(base) root@docker-desktop:/home/jovyan/work/process# python setup_config.py

Study region and all cities configuration files were generated for 25 regions: adelaide, auckland, baltimore, bangkok, barcelona, belfast, bern, chennai, cologne, ghent, graz, hanoi, hong_kong, lisbon, maiduguri, melbourne, mexico_city, odense, olomouc, phoenix, sao_paulo, seattle, sydney, valencia, vic

or,

(base) root@docker-desktop:/home/jovyan/work/process# python setup_config.py adelaide

Study region and all cities configuration files were generated for 1 regions: adelaide

The reason the gtfs config file is required, as it contains the metadata required for access to the city specific gtfs geopackages (ie. file names and locations, and layer names etc). This saves having to write these twice in separate configuration files, which could be problematic. Ideally at some point a re-factor occurs which integrates all setup (pre-proccess, process, and GTFS) in some minimal and human readable way.

2) The sp.py reads in the adelaide.py code file that was generated in the above step.

Aside: _Technically the same information could be retrieved more or less with some re-write through importation of the setup_config.py (check out setup_aggr.py which imports this, and is in turn imported by aggr.py). However, maybe there is something nice about the bespoke generated files as an archive of how the analysis was run at a particular point in time, and which could be subject to later re-parameterisation (if that is the perspective that is being taken, the config file should probably be dated -- and even then, it could just be a text record, and not the actual source of parameters which could still be retrieved through import of the setupconfig.py code directly).

I don't think its a good idea to change line 50 to json, as that will be working with an out of date version of the configuration file. The current version is generated through running of 'python setup_config.py' as above. This should work now the GTFS script is in the repository.

tl/dr: I reckon - give it a go again after a git pull or merge with master to ensure the GTFS config file is present and then generate the config .py scripts for each study region. Touch wood, step (2) will no longer be an issue.

Hope that helps,

Carl

nicholas-david commented 3 years ago

Hey @carlhiggs ,

Thank you for the updates. That makes sense. I merged from the branch and that seemed to fix up the issue with the setup_config. I ran the sp.py for Adelaide, and it successfully made a gpkg and received the following

Global indicators project 2020-12-14

Process city: Adelaide

Prepare network resources...

  • Read network from disk.
  • Remove unnecessary key data from edges : 100%|███████████| 316022/316022 [00:02<00:00, 119817.27it/s]
  • Project graph
  • Ensure graph is undirected.
  • Save projected graphml to disk
  • 'VirtualXPath' [XML Path Language - XPath] Initialise sample point output geopackage as a copy of input geopackage
  • 'VirtualXPath' [XML Path Language - XPath]

First pass node-level neighbourhood analysis (Calculate average population and intersection density for each intersection node in study regions, taking mean values from distinct hexes within neighbourhood buffer distance)

  • Set up simple nodes
  • Generate 1000m neighbourhoods for nodes (All pairs Dijkstra shortest path analysis) : 111710nodes [04:28, 415.56nodes/s]
  • Summarise attributes (average value from unique associated hexes within nh buffer distance)... : 100%|█████████████████| 91712/91712 [21:37<00:00, 70.71it/s]
  • 'VirtualXPath' [XML Path Language - XPath] Time taken to calculate or load city local neighbourhood statistics: 27.8408 mins

Calculate assessbility to POIs. Generating contraction hierarchies with 2 threads. Setting CH node vector of size 111710 Setting CH edge vector of size 158509 Range graph removed 5798 edges of 317018 . 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%

Calculating nearest node analyses ...

  • Open street map destinations ['fresh_food_market', 'convenience', 'pt_osm_any']

  • 'VirtualXPath' [XML Path Language - XPath]

  • Public open space ['public_open_space_any']

  • 'VirtualXPath' [XML Path Language - XPath] ['public_open_space_large']

  • 'VirtualXPath' [XML Path Language - XPath]

  • Public transport (GTFS) ['pt_gtfs_any', 'pt_gtfs_freq_30', 'pt_gtfs_freq_20'] Traceback (most recent call last): File "fiona/_shim.pyx", line 83, in fiona._shim.gdal_open_vector File "fiona/_err.pyx", line 291, in fiona._err.exc_wrap_pointer fiona._err.CPLE_OpenFailedError: data/GTFS/gtfs_frequent_transit_headway_2020-08-27_python.gpkg: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "sp.py", line 215, in gdf_poi = gpd.read_file(f"data/{analysis['geopackage']}", layer = layer) File "/opt/conda/lib/python3.7/site-packages/geopandas/io/file.py", line 96, in _read_file with reader(path_or_bytes, *kwargs) as features: File "/opt/conda/lib/python3.7/site-packages/fiona/env.py", line 400, in wrapper return f(args, kwargs) File "/opt/conda/lib/python3.7/site-packages/fiona/init.py", line 257, in open layer=layer, enabled_drivers=enabled_drivers, kwargs) File "/opt/conda/lib/python3.7/site-packages/fiona/collection.py", line 164, in init self.session.start(self, **kwargs) File "fiona/ogrext.pyx", line 536, in fiona.ogrext.Session.start File "fiona/_shim.pyx", line 90, in fiona._shim.gdal_open_vector fiona.errors.DriverError: data/GTFS/gtfs_frequent_transit_headway_2020-08-27_python.gpkg: No such file or directory

nicholas-david commented 3 years ago

When I tried to use the process_regions shell, however, I got the following error:

process_regions.sh: line 2: $'\r': command not found process_regions.sh: line 4: syntax error near unexpected token $'do\r'' 'rocess_regions.sh: line 4:do

carlhiggs commented 3 years ago

Hi @nicholas-david,

The first issue

fiona.errors.DriverError: data/GTFS/gtfs_frequent_transit_headway_2020-08-27_python.gpkg: No such file or directory

relates to the first issue in the post above --- it doesn't sound like you've downloaded the GTFS input data that is required to run the process. I added the folder and config script to the respository, but it still requires the GTFS frequent transport data be present.

I believe you have access to the CloudStor "Global Indiactors - data" folder where inputs and outputs are stored. At the same level in the folder hierarchy (ie. in ./process/data) should be folders for GTFS, and if you want to do pre-processing also GHS and the boundaries geopackage --- and the OSM file dated 3 August 2020 (it was published on OSM planet archives on 13 August 2020), however I have found that is too large to upload (can be retrieved from https://planet.osm.org/pbf/planet-200803.osm.pbf.torrent ; note that the original pbf file is not up there any more, so you retrieve e.g. using utorrent). I believe I finally got all the GHS tiles uploaded but this was a lot of data too, and in practice I kept getting time outs.

but tl/dr for this point is --- I believe your problem will be fixed if you download this GTFS folder in image below and locate in the data folder:

image

Regarding your second question, this relates to issues committing a unix formatted shell script to this git repository --- I tried several times from different computers using unix formatted line endings (\n), but it seemed to replace with windows style line endings each time (\r\n) --- I don't know why, but that is why you have the problem. While I use the shell files in the unix docker environment on my system, some re-configuration of Git is required apparently to get the appropriate line ending behaviours for inclusion of the shell files in the repository without messing up the formatting (e.g. see https://docs.github.com/en/free-pro-team@latest/github/using-git/configuring-git-to-handle-line-endings ). This wasn't a priority for me to fix as the files worked locally, but it is something someone should deal with at some point. If someone (eg. @gboeing ?) is more familiar with resolving this issue I'm sure they could resolve it in no time!

As a stop gap for your own local purposes, if you open the file in your text editor, some can convert directly to unix line endings; otherwise, you could replace '\r\n' with '\n' and that would do the trick.

Hope that helps,

Carl

gboeing commented 3 years ago

@nicholas-david in the meantime if you're using sublime you can convert windows line endings -> linux in one of the menus.

carlhiggs commented 3 years ago

As an aside regarding the "'VirtualXPath' [XML Path Language - XPath]" output noise, that seems like a recently introduced artifact which arises through use of the geopackages / spatialite format ie. when data is read in or written --- it didn't produce this output earlier in the year, but others have noted it in posts as unwanted noise, and there doesn't seem to be clear guidance on how to suppress it.

https://www.mail-archive.com/qgis-developer@lists.osgeo.org/msg51925.html https://www.gaia-gis.it/fossil/libspatialite/tktview/760ef1affb822806191393ac3f208fc9d8647758

It's a shame -- the process output was a lot cleaner previously.

gboeing commented 3 years ago

@carlhiggs yeah I've been dealing with that CLI noise across all my projects over the past month or so... it's pretty annoying. I'm hoping dependency version bumps in coming months resolve it.