This PR adds class DHSDataManager to the povertymapping.dhs module. This class is meant to handle loading individual country DHS data while keeping track of each country, allowing us to do cross-country operations down the line. It is meant to replicate the original functions as much as possible, allowing us to use the same usage patterns. The key usage difference is the requirement to pass a user-defined country_index string labelling each data file loaded (ex. Philippines, Cambodia) so that the manager can keep track of dataframes.
Main functions
class DHSManager
generate_dhs_cluster_level_data: generate geographic DHS data by cluster from the raw files
generate_dhs_household_level_data: generate household-level DHS data from the raw files
this is called by generate_dhs_cluster_level_data by default, so only use this if you need the household level only
get_cluster_level_data_by_country: generate a concatenated geographic dataframe for all specified country_index
get_household_level_data_by_country: generate a concatenated household dataframe for all specified country_index
recompute_index_household_level: recompute wealth index based on specified countries. Optionally you can pass a list of features
recompute_index_cluster_level: recompute wealth index based on specified countries and aggregate by cluster. Optionally you can pass a list of features
Usage
This class is used similarly to the previous individual function. For example:
from povertymapping import dhs
dhs_gdf = dhs.generate_dhs_cluster_level_data(<function parameters>)
Overview
This PR adds
class DHSDataManager
to thepovertymapping.dhs
module. This class is meant to handle loading individual country DHS data while keeping track of each country, allowing us to do cross-country operations down the line. It is meant to replicate the original functions as much as possible, allowing us to use the same usage patterns. The key usage difference is the requirement to pass a user-definedcountry_index
string labelling each data file loaded (ex.Philippines
,Cambodia
) so that the manager can keep track of dataframes.Main functions
class DHSManager
generate_dhs_cluster_level_data
: generate geographic DHS data by cluster from the raw filesgenerate_dhs_household_level_data
: generate household-level DHS data from the raw filesgenerate_dhs_cluster_level_data
by default, so only use this if you need the household level onlyget_cluster_level_data_by_country
: generate a concatenated geographic dataframe for all specified country_indexget_household_level_data_by_country
: generate a concatenated household dataframe for all specified country_indexrecompute_index_household_level
: recompute wealth index based on specified countries. Optionally you can pass a list of featuresrecompute_index_cluster_level
: recompute wealth index based on specified countries and aggregate by cluster. Optionally you can pass a list of featuresUsage
This class is used similarly to the previous individual function. For example:
becomes
The manager allows us to access the data for each country_index specified and generate combined dataframes:
Others
Let me know your feedback! Super happy to implement improvements to the manager :D
Addresses #109