wri / UrbanLandUse

Characterizing urban land use with machine learning
29 stars 11 forks source link

UrbanLandUse

Characterizing urban land use with machine learning

Summary

This repository contains a comprehensive set of instructions for creating and applying models that characterize land use / land cover (LULC) in urban areas using machine learning. The context and motivation for the project are described in detail in WRI technical note "Spatial Characterization of Urban Land Use through Machine Learning" (forthcoming in Q1 2020). The code presented here belongs to the revised and expanded methodology described in an addendum to that technical note (also forthcoming Q1 2020).

The core workflow is encapsulated within and best understood via a sequence of Jupyter notebooks. These notebooks import and utilize a number of accompanying modules, which are simple .py files stored in the utils folder. There is also one precursor step for processing the ground-truth data from the Atlas of Urban Expansion (AUE); this was executed in QGIS via manual interaction in concert with a sequence of short Python scripts.

Requirements

The libraries and packages required to execute the notebooks are listed in the imports block at the beginning of each. In general, these are standard geospatial and data analysis Python libraries.

Several parts of the workflow utilize the descarteslabs package for imagery retreival or geospatial tiling. This is the Python API provided by Descartes Labs that provides access to its "data refinery" capabilities. Utilizing the API requires registration and token generation, as described in their documentation. Unaffiliated users may not have access to all offerings, such as certain remote sensing products.

Environment

Nearly all workflows, including data preparation, model training, and performance assessment, were implemented in Jupyter notebooks running a Python 3.7 kernel within a custom conda environment, and executed within those notebooks or standalone Python scripts. The computing environment was Debian (Linux) within a virtual machine hosted on Google Compute Engine, built from a Google disk image for machine learning. Actual modeling was conducted using the Keras library on top of TensorFlow. Training and model application utilized a single Tesla K80 GPU.

Utilizing a self-contained conda environment can help avoid versioning complications and compatibility problems between various libraries. To replicate precisely the conda environment used to develop this codebase, create an environment using the provided ulu_environment.yml file.

Workflow

1. Prepare Atlas of Urban Expansion files (executed in QGIS)