worldbank / pip

Stata module to access World Bank’s Global Poverty and Inequality data
https://worldbank.github.io/pip
Creative Commons Attribution 4.0 International
11 stars 6 forks source link

Medians missing with pip.ado #28

Closed danielmahler closed 2 years ago

danielmahler commented 2 years ago

Hi all,

I'm in the process of preparing our annual data submission for SDG 10.2.1 (share living below half the median) and noticed that there are a few missing medians returned through the pip.ado

pip, country(all) year(all) keep if missing(median)

Most of these were also missing in povcalnet.ado. In noticed that Argentina has missing medians now, though, which it didn't before. It would be nice not to have any missings, but perhaps this is not the greatest problem on your to-do list.

Tefera19 commented 2 years ago

Hi @randrescastaneda

Could there be a reason why median are missing for chn, gnb, gtm, guy, idn, ind, nam, sle, and slv at national level estimates? I noticed that the median estimates are calculated for rural/urban level for the specified years.

Tefera

randrescastaneda commented 2 years ago

Hi @danielmahler,

In principle, we don't report medians at the national level for aggregate distributions like CHN, IDN, and IND. There are some group data countries, however, like SLV 1989, whose data does not meet the requirements for any of the Lorenz parametrization models. That is, they do not fit a normal distribution and/or the estimates of the regressions are invalid. Thus, we don't have distribution estimates for those countries. The table below shows the problematic country/years

PIP

image

I checked with a povcalnet, and we have less problematic countries in PIP than in Povcalnet

Povcalnet

image

Maybe for the next release we could include a section in the methodology handbook explaining why some group data distributions fail in the Lorenz parametrization and provide the list of countries. What do you think?

If it is ok with you, please close this issue.

Thanks.

cc: @tonyfujs

danielmahler commented 2 years ago

But how can we calculate poverty rates for the countries where the estimates of the regressions are invalid?

randrescastaneda commented 2 years ago

Fair question. Following the old practice of Povcalnet, PIP selects distributional stats and poverty stats with different but complementary algorithms. In the case of SLV 1989, for instance, the algorithm says that neither of the Lorenz parametrizations is valid for dist stats. Yet, only the Lorenz Beta is used for poverty Stats because it meets normality, whereas the quadratic does not meet normality. This is why we have estimates for poverty but not for distribution. It is an old practice that I am not sure if it should be revisited.

The functions of the algorithm can be found in the gd_select_lorenz.R file of the {wbpip} package. In production, we use the function prod_gd_select_lorenz, which uses the function retrieve_poverty(). If you want, we could talk about it in detail over Teams.

Feel free to close the issue if you think my answer is enough to leave it at this point.

Best,