Closed alronlam closed 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Updated to use MinMaxScaler for labels. See PNG files of the updated predicted wealth maps in the Files Changed
tab.
Predicted wealth distributions look the same. The values just changed to be from 0-1.
Also removed code related to split-quintile bins.
View / edit / reply to this conversation on ReviewNB
tm-jace-peralta commented on 2023-03-24T02:55:02Z ----------------------------------------------------------------
Line #3. rollout_aoi["Predicted Relative Wealth Index"], split_quantile=False
In the cross country notebooks, I edited the feature_engineering.categorize_wealth_index
function to set split_quantile=False
as default so it doesnt show up at all in the notebooks
Updated PR to just update the 2023-02-21 versions since this was a minor code change, and to be consistent with cross country model.
Problem:
Previous methodology where we instantiate two separate scalers for ground truth data and rollout data didn't work well, as rollout predictions were mostly (if not all) positive (meaning every place is "above average" wealth). Intuitively, this shouldn't be as most of the countries are developing countries.
Verified that the cause of this was because of very different distributions between ground truth and rollout features. For rollouts, there are many sparsely-populated, remote areas that have low feature values (low night time lights, internet speeds, and POIs). As a result, this pulled the mean down very heavily for all features. See details below.
Example implication is that a feature value will be interpreted differently by the model during training vs rollout. E.g. For the Philippine data, if a place has poi count of 50:
There is a discrepancy even though it's the same POI count, and will therefore confuse the model. I.e. if the mean is so low during rollout (because most have 0 values), most areas will have "above average" feature values, and model will think the wealth is "above average" too.
Cambodia
Philippines
Myanmar
Timor Leste
Solution:
Update: 2023-03-23 As agreed, scaling was updated to MinMax for uniformity with cross-country rollouts. The images and histograms are here are outdated, though the change is quite minor (only the actual values changed, but the distribution and viz look largely the same).
Cambodia
Myanmar
Philippines
Timor Leste