Random forest does not need normalization but for our case, the consolidated survey data is not sampled from a single population (country). The model trained from sample data from one population (country) is likely not able to predict well when tested with samples from another population.
Normalization helps by attempting to correct country-specific feature quirks by fixing the allowable range of values:
PH samples having generally higher nightlights luminosity and road count
Evident for lower wealth income quintiles
When normalized, each country will have its own baseline
Reference: Chi, G. et al. (2022) “Microestimates of wealth for all low- and middle-income countries,” Proceedings of the National Academy of Sciences, 119(3). Available at: https://doi.org/10.1073/pnas.2113658119.
-Wealth estimation for low and middle income countries using DHS: 135 countries
Used StandardScaler per country: All input features are normalized by subtracting the country-specific mean and dividing by the country-specific SD
Explore: outlier detection methods per country
Next step: normalize each feature per country along with the wealth index (as documented by issue #112)
Normalization per country:
Next step: normalize each feature per country along with the wealth index (as documented by issue #112)