thinkingmachines / unicef-ai4d-poverty-mapping

UNICEF AI4D Relative Wealth Mapping Project - datasets, models, and scripts for building relative wealth estimation models across Southeast Asia (SEA)
https://thinkingmachines.github.io/unicef-ai4d-poverty-mapping
MIT License
20 stars 8 forks source link

Remove scaling for DHS country rollouts #173

Closed alronlam closed 1 year ago

alronlam commented 1 year ago

Problem:

There is a discrepancy even though it's the same POI count, and will therefore confuse the model. I.e. if the mean is so low during rollout (because most have 0 values), most areas will have "above average" feature values, and model will think the wealth is "above average" too.

Cambodia

avg_rad_mean TRAIN mean = 1.50, std = 5.30
avg_rad_mean ROLLOUT mean = 0.05, std = 0.62

mobile_2019_mean_avg_d_kbps_mean TRAIN mean = 322.63, std = 413.98
mobile_2019_mean_avg_d_kbps_mean ROLLOUT mean = 157.14, std = 471.79

fixed_2019_mean_avg_d_kbps_mean TRAIN mean = 234.58, std = 276.92
fixed_2019_mean_avg_d_kbps_mean ROLLOUT mean = 73.34, std = 290.23

poi_count TRAIN mean = 46.25, std = 201.90
poi_count ROLLOUT mean = 0.81, std = 19.13

road_count TRAIN mean = 101.05, std = 200.60
road_count ROLLOUT mean = 9.75, std = 37.72

Philippines

avg_rad_mean TRAIN mean = 4.62, std = 9.54
avg_rad_mean ROLLOUT mean = 0.51, std = 1.72

mobile_2019_mean_avg_d_kbps_mean TRAIN mean = 343.36, std = 381.64
mobile_2019_mean_avg_d_kbps_mean ROLLOUT mean = 157.89, std = 400.67

fixed_2019_mean_avg_d_kbps_mean TRAIN mean = 350.35, std = 357.25
fixed_2019_mean_avg_d_kbps_mean ROLLOUT mean = 129.71, std = 294.39

poi_count TRAIN mean = 52.97, std = 137.45
poi_count ROLLOUT mean = 2.84, std = 24.08

road_count TRAIN mean = 266.70, std = 441.41
road_count ROLLOUT mean = 33.43, std = 102.69

Myanmar

avg_rad_mean TRAIN mean = 1.57, std = 4.13
avg_rad_mean ROLLOUT mean = 0.32, std = 1.15

mobile_2019_mean_avg_d_kbps_mean TRAIN mean = 457.56, std = 554.62
mobile_2019_mean_avg_d_kbps_mean ROLLOUT mean = 224.68, std = 511.29

fixed_2019_mean_avg_d_kbps_mean TRAIN mean = 123.31, std = 266.52
fixed_2019_mean_avg_d_kbps_mean ROLLOUT mean = 32.23, std = 183.22

poi_count TRAIN mean = 14.44, std = 53.32
poi_count ROLLOUT mean = 1.07, std = 13.75

road_count TRAIN mean = 76.99, std = 164.38
road_count ROLLOUT mean = 15.37, std = 44.64

Timor Leste

avg_rad_mean TRAIN mean = 0.96, std = 2.66
avg_rad_mean ROLLOUT mean = 0.10, std = 0.56

mobile_2019_mean_avg_d_kbps_mean TRAIN mean = 129.21, std = 255.05
mobile_2019_mean_avg_d_kbps_mean ROLLOUT mean = 38.11, std = 166.70

fixed_2019_mean_avg_d_kbps_mean TRAIN mean = 19.18, std = 61.11
fixed_2019_mean_avg_d_kbps_mean ROLLOUT mean = 3.19, std = 39.92

poi_count TRAIN mean = 4.33, std = 14.52
poi_count ROLLOUT mean = 0.39, std = 3.28

road_count TRAIN mean = 41.90, std = 87.16
road_count ROLLOUT mean = 8.67, std = 19.63

Solution:

Update: 2023-03-23 As agreed, scaling was updated to MinMax for uniformity with cross-country rollouts. The images and histograms are here are outdated, though the change is quite minor (only the actual values changed, but the distribution and viz look largely the same).

Cambodia

cambodia_new kh_hist

Myanmar

mm_new mm_hist

Philippines

2023-03-21-ph-predicted-wealth-index ph_hist

Timor Leste

east_timor_new tl_hist

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

alronlam commented 1 year ago

Updated to use MinMaxScaler for labels. See PNG files of the updated predicted wealth maps in the Files Changed tab.

Predicted wealth distributions look the same. The values just changed to be from 0-1.

Also removed code related to split-quintile bins.

review-notebook-app[bot] commented 1 year ago

View / edit / reply to this conversation on ReviewNB

tm-jace-peralta commented on 2023-03-24T02:55:02Z ----------------------------------------------------------------

Line #3.        rollout_aoi["Predicted Relative Wealth Index"], split_quantile=False

In the cross country notebooks, I edited the feature_engineering.categorize_wealth_index function to set split_quantile=False as default so it doesnt show up at all in the notebooks


alronlam commented 1 year ago

Updated PR to just update the 2023-02-21 versions since this was a minor code change, and to be consistent with cross country model.