Setup | Code Organization | Data Sources | Models | Key Results | Acknowledgements
This repository accompanies our research work, "Mapping Philippine Poverty using Machine Learning, Satellite Imagery, and Crowd-sourced Geospatial Information".
The goal of this project is to provide a means for faster, cheaper, and more granular estimation of poverty measures in the Philippines using machine learning, satellite imagery, and open geospatial data.
To get started, run the jupyter notebooks in notebooks/
in order.
Note that to run the notebooks, all dependencies must be installed. We provided
a Makefile
to accomplish this task:
make venv
make build
This creates a virtual environment, venv
, and installs all dependencies found
in requirements.txt
. In order to run the notebooks inside venv
, execute the
following command:
ipython kernel install --user --name=venv
Notable dependencies include:
This repository is divided into three main parts:
It is possible to follow our experiments and reproduce the models we've built by going through the notebooks one-by-one. For model training, we leveraged a Google Compute Engine (GCE) instance with 16 vCPUs and 60 GB of memory (n1-standard-16) and an NVIDIA Tesla P100 GPU.
We used the poverty indicators in the 2017 Philippine Demographic and Health Survey as a measure of ground-truth for socioeconomic indicators. The survey is conducted every 3 to 5 years, and contains nationally representative information on different indicators across the country.
Due to data access agreements, users need to independently download data files from the Demographic and Health Survey Website. This may require you to create an account and fill out a Data User Agreement form.
Once downloaded, copy and unzip the file in the /data
directory. The notebook /notebooks/00_dhs_prep.ipynb
will walk you through how to prepare the dataset for modeling.
We used the Google Static Maps API to download 400x400 px zoom 17 satellite images. To download satellite images and generate training/validation sets, run the following script in the src/ directory:
python data_download.py Note that this script downloads 134,540 satellite images from Google Static Maps and may incur costs. See this page for more information on Maps Static API Usage and Billing.
To download satellite images and generate training/validation sets, run the following script in src/
:
python data_download.py
To train the nighttime lights transfer learning model, run the following script in src/
:
python train.py
Usage is as follows:
usage: train.py [-h] [--batch-size N] [--lr LR] [--epochs N] [--factor N]
[--patience N] [--data-dir S] [--model-best-dir S]
[--checkpoint-dir S]
Philippine Poverty Prediction
optional arguments:
-h, --help show this help message and exit
--batch-size N input batch size for training (default: 32)
--lr LR learning rate (default: 1e-6)
--epochs N number of epochs to train (default: 100)
--factor N factor to reduce learning rate by on pleateau (default:
0.1)
--patience N number of iterations before reducing lr (default: 10)
--data-dir S data directory (default: "../data/images/")
--model-best-dir S best model path (default: "../models/model.pt")
--checkpoint-dir S model directory (default: "../models/")
We developed wealth prediction models using different data sources. You can
follow-through our analysis by looking at the notebooks in the notebooks/
directory.
notebooks/03_transfer_model.ipynb
): we used
a transfer learning approach proposed by Xie et al and Jean et al. The main
assumption here is that nighttime lights act as a good proxy for economic
activity. We started with a Convolutional Neural Network (CNN)
pre-trained on ImageNet, and used the feature embeddings as input into a ridge
regression model.notebooks/01_lights_eda.ipynb
,
notebooks/03_lights_model.ipynb
): in this model, we generated nighttime
light features consisting of summary statistics and histogram-based
features. We then compared the performance of three different machine
learning algorithms: ridge regression, random forest regressor, and
gradient boosting method (XGBoost).notebooks/04_osm_model.ipynb
): we extracted three
types of OSM features, roads, buildings, and points-of-interests (POIs)
within a 5-km radius for rural areas and 2-km radius for urban areas. We
then trained a random forest regressor on these features.notebooks/02_lights_eda.ipynb
, notebooks/04_osm_model.ipynb
): we also
trained a random forest model combining OSM data and nighttime
lights-derived features as input.Use this bibtex to cite this repository:
@misc{ph_poverty_prediction_2018,
title={Mapping Poverty in the Philippines Using Machine Learning, Satellite Imagery, and Crowd-sourced Geospatial Information},
author={Tingzon, Isabelle and Orden, Ardie and Sy, Stephanie and Sekara, Vedran and Weber, Ingmar and Fatehkia, Masoomali and Herranz, Manuel Garcia and Kim, Dohyung},
year={2018},
publisher={Github},
journal={GitHub repository},
howpublished={\url{https://github.com/thinkingmachines/ph-poverty-mapping}},
}
This work was supported by the UNICEF Innovation Fund.