thinkingmachines / unicef-ai4d-poverty-mapping

UNICEF AI4D Relative Wealth Mapping Project - datasets, models, and scripts for building relative wealth estimation models across Southeast Asia (SEA)
https://thinkingmachines.github.io/unicef-ai4d-poverty-mapping
MIT License
20 stars 8 forks source link

Add feature_engineering module and test notebook #147

Closed tm-jc-nacpil closed 1 year ago

tm-jc-nacpil commented 1 year ago

This PR adds the feature_engineering.py module to povertymapping, alongside a test notebook that demonstrates its use.

Addresses #142

Example Usage

# Make the AOI for DHS
country_osm = "east-timor"
ookla_year = 2019
nightlights_year = 2016
col_rename_config = 'tl'
dhs_household_dta_path = settings.DATA_DIR/"dhs/tl/TLHR71DT/TLHR71FL.DTA"
dhs_geographic_shp_path = settings.DATA_DIR/"dhs/tl/TLGE71FL/TLGE71FL.shp"

dhs_gdf = generate_dhs_cluster_level_data(
    dhs_household_dta_path, 
    dhs_geographic_shp_path, 
    col_rename_config="tl",
    convert_geoms_to_bbox=True,
    bbox_size_km=2.4
).reset_index(drop=True)
country_data = dhs_gdf.copy()

# Generate the features for the AOI. scaled_only limits the returned features to scaled features only, while features_only returns only features and excludes the columns of the passed AOI.
features = generate_features(dhs_gdf, country_osm, ookla_year, nightlights_year,col_rename_config, scaled_only=True, features_only=True)

# Generate the labels by scaling the wealth index column
labels = generate_labels(labels, 'Wealth Index')

<model training code here>
review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

butchtm commented 1 year ago

This PR adds the feature_engineering.py module to povertymapping, alongside a test notebook that demonstrates its use.

  • generate_features: Generates the base features for an AOI based on DHS, OSM, Ookla, and VIIRS (nighttime lights) parameters. By default it uses StandardScaler but can use other sklearn scalers.
  • generate_labels: Generates the label data for an AOI given a column. By default, it uses StandardScaler.

Example Usage

# Make the AOI for DHS
country_osm = "east-timor"
ookla_year = 2019
nightlights_year = 2016
col_rename_config = 'tl'
dhs_household_dta_path = settings.DATA_DIR/"dhs/tl/TLHR71DT/TLHR71FL.DTA"
dhs_geographic_shp_path = settings.DATA_DIR/"dhs/tl/TLGE71FL/TLGE71FL.shp"

dhs_gdf = generate_dhs_cluster_level_data(
    dhs_household_dta_path, 
    dhs_geographic_shp_path, 
    col_rename_config="tl",
    convert_geoms_to_bbox=True,
    bbox_size_km=2.4
).reset_index(drop=True)
country_data = dhs_gdf.copy()

# Generate the features for the AOI. scaled_only limits the returned features to scaled features only, while features_only returns only features and excludes the columns of the passed AOI.
features = generate_features(dhs_gdf, country_osm, ookla_year, nightlights_year,col_rename_config, scaled_only=True, features_only=True)

# Generate the labels by scaling the wealth index column
labels = generate_labels(labels, 'Wealth Index')

<model training code here>
butchtm commented 1 year ago

Mistakenly closed instead of finishing my review. Reopening.

butchtm commented 1 year ago

Please check the parameters to the call to generate_features as it should not include the parameter col_rename_config which is a parameter for generate dhs cluster data.

tm-jc-nacpil commented 1 year ago

Hi @butchtm @alronlam, thanks for the catch regarding the cache_dir! Made the edits, can you check again now to see how it looks? :D