thinkingmachines / unicef-ai4d-poverty-mapping

UNICEF AI4D Relative Wealth Mapping Project - datasets, models, and scripts for building relative wealth estimation models across Southeast Asia (SEA)
https://thinkingmachines.github.io/unicef-ai4d-poverty-mapping
MIT License
20 stars 8 forks source link

feat: Add model training notebook (PH) #76

Closed tm-jc-nacpil closed 1 year ago

tm-jc-nacpil commented 1 year ago

Overview

Addresses #72

This PR add the end-to-end initial model training notebook for the Philippines. The notebook processes the features given the PH DHS clusters, prepares the data, performs (simple) cross-validation evaluation, and finally trains and evaluates a random forest model to predict the wealth index

Input

Output

image image

Notes

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

alronlam commented 1 year ago

In summary, I think what's important is we use SHAP for feature importance. Random forest feature importance can be quite unstable. Good to do it soon so that's what we report in our sprint review.

Scroll down to the SHAP part to see simple 1/2-liners on how to produce a SHAP Chart. https://mljar.com/blog/feature-importance-in-random-forest/

Probably do this for later cause it takes more time, but we can also consider creating a modified version of the bar chart where it's colored according to whether (in general) the feature makes the wealth estimate go higher/lower. Reference code for generating simplified SHAP charts from our air quality repo:

The other comments we can probably improve later on :D