Closed tm-jc-nacpil closed 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
This PR adds the
feature_engineering.py
module topovertymapping
, alongside a test notebook that demonstrates its use.
generate_features
: Generates the base features for an AOI based on DHS, OSM, Ookla, and VIIRS (nighttime lights) parameters. By default it uses StandardScaler but can use other sklearn scalers.generate_labels
: Generates the label data for an AOI given a column. By default, it uses StandardScaler.Example Usage
# Make the AOI for DHS country_osm = "east-timor" ookla_year = 2019 nightlights_year = 2016 col_rename_config = 'tl' dhs_household_dta_path = settings.DATA_DIR/"dhs/tl/TLHR71DT/TLHR71FL.DTA" dhs_geographic_shp_path = settings.DATA_DIR/"dhs/tl/TLGE71FL/TLGE71FL.shp" dhs_gdf = generate_dhs_cluster_level_data( dhs_household_dta_path, dhs_geographic_shp_path, col_rename_config="tl", convert_geoms_to_bbox=True, bbox_size_km=2.4 ).reset_index(drop=True) country_data = dhs_gdf.copy() # Generate the features for the AOI. scaled_only limits the returned features to scaled features only, while features_only returns only features and excludes the columns of the passed AOI. features = generate_features(dhs_gdf, country_osm, ookla_year, nightlights_year,col_rename_config, scaled_only=True, features_only=True) # Generate the labels by scaling the wealth index column labels = generate_labels(labels, 'Wealth Index') <model training code here>
Mistakenly closed instead of finishing my review. Reopening.
Please check the parameters to the call to generate_features
as it should not include the parameter col_rename_config
which is a parameter for generate dhs cluster data.
Hi @butchtm @alronlam, thanks for the catch regarding the cache_dir! Made the edits, can you check again now to see how it looks? :D
This PR adds the
feature_engineering.py
module topovertymapping
, alongside a test notebook that demonstrates its use.generate_features
: Generates the base features for an AOI based on DHS, OSM, Ookla, and VIIRS (nighttime lights) parameters. By default it uses StandardScaler but can use other sklearn scalers.generate_labels
: Generates the label data for an AOI given a column. By default, it uses StandardScaler.Addresses #142
Example Usage