thinkingmachines / unicef-ai4d-poverty-mapping

UNICEF AI4D Relative Wealth Mapping Project - datasets, models, and scripts for building relative wealth estimation models across Southeast Asia (SEA)
https://thinkingmachines.github.io/unicef-ai4d-poverty-mapping
MIT License
20 stars 8 forks source link

DHS Cross Country Data Manager #113

Closed tm-jc-nacpil closed 1 year ago

tm-jc-nacpil commented 1 year ago

Overview

This PR adds class DHSDataManager to the povertymapping.dhs module. This class is meant to handle loading individual country DHS data while keeping track of each country, allowing us to do cross-country operations down the line. It is meant to replicate the original functions as much as possible, allowing us to use the same usage patterns. The key usage difference is the requirement to pass a user-defined country_index string labelling each data file loaded (ex. Philippines, Cambodia) so that the manager can keep track of dataframes.

Main functions

class DHSManager

Usage

This class is used similarly to the previous individual function. For example:

from povertymapping import dhs
dhs_gdf = dhs.generate_dhs_cluster_level_data(<function parameters>)

becomes

from povertymapping import dhs
dhs_data_manager = dhs.DHSDataManager()
country_index = "Philippines"
dhs_gdf = dhs_data_manager.generate_dhs_cluster_level_data(country_index,<function parameters>)

The manager allows us to access the data for each country_index specified and generate combined dataframes:

# Load each country individually
ph_gdf = dhs_data_manager.generate_dhs_cluster_level_data("Philippines", <ph parameters>, return_data=True)
tl_gdf = dhs_data_manager.generate_dhs_cluster_level_data("Timor Leste", <tl parameters>, return_data=True)
mm_gdf = dhs_data_manager.generate_dhs_cluster_level_data("Myanmar", <mm parameters>, return_data=True)
kh_gdf = dhs_data_manager.generate_dhs_cluster_level_data("Cambodia", <kh parameters, return_data=True)

# Combined data for country_index specified in list (default None returns all loaded countries)
ph_and_tl_gdf = dhs_data_manager.get_cluster_level_data_by_country(["Philippines", "Timor Leste"])
all_countries_gdf = dhs_data_manager.get_cluster_level_data_by_country()

# Return recalculated wealth index (pooled by the specified countries)
recomputed_index_gdf = recompute_index_cluster_level()

Others

Let me know your feedback! Super happy to implement improvements to the manager :D

Addresses #109