worldbank / PIP-Methodology

Methodology page for the Poverty and Inequality Platform.
https://datanalytics.worldbank.org/PIP-Methodology/
MIT License
2 stars 1 forks source link

Documentation for several PIP data columns #19

Open awendland opened 1 year ago

awendland commented 1 year ago

Hello! Thank you for all the work you do making otherwise impossible-to-find-data accessible for everyone.

I have been working with the PIP dataset available at this API: https://api.worldbank.org/pip/v1/pip?year=all&format=csv.

It's a wonderful consolidation of information from all the national surveys, and I'm grateful for the backing documentation in this repository explaining how it was put together.

However, I've been struggling to interpret a few columns and was hoping we could expand the documentation to provide explicit overviews of them. In particular:

Apologies if I missed this somewhere, but a reference table mapping the CSV/JSON columns/keys to the corresponding documentation in the PIP methodology book would be helpful. Some columns, like headcount, are hard to map since it lacks the poverty_ prefix like poverty_gap and poverty_severity which have clearer 1:1 mappings on this page.

And another apology if this repository is not the appropriate place for this discussion. Please let me know if there is a more appropriate forum. As far as I could tell, this Poverty and Inequality Platform Methodology Handbook was the best source of documentation for the data available at https://pip.worldbank.org/api. I think people would also benefit if this handbook was linked from the API page, because currently the API page doesn't have an documentation about the responses.

Thank you again for all the work you do!

tonyfujs commented 1 year ago

Hi @awendland Thanks for reaching out. This is very helpful feedback. There is a dedicated data dictionary endpoint, but

You can access the data dictionary here: https://api.worldbank.org/pip/v1/aux?table=dictionary We will update it asap. In the meantime, please find responses to your questions below:

  • median - Over which periods is this column reported? Are these yearly values in thousands of dollars? Are these daily values in single dollars? I can't find the words "daily" or "day" in this repository in relation to this column.

The median of daily household per capita income or consumption expenditure from the survey in PPP

reporting_gdp - What is this value? Is this GDP per capita? reporting_pce - What is this value?

Both are per capita values

Apologies if I missed this somewhere, but a reference table mapping the CSV/JSON columns/keys to the corresponding documentation in the PIP methodology book would be helpful. Some columns, like headcount, are hard to map since it lacks the poverty_ prefix like poverty_gap and poverty_severity which have clearer 1:1 mappings on this page.

Very good point. We'll try to make that mapping clearer, and can also add a link to the correct documentation chapter in the data dictionary

And another apology if this repository is not the appropriate place for this discussion. Please let me know if there is a more appropriate forum. As far as I could tell, this Poverty and Inequality Platform Methodology Handbook was the best source of documentation for the data available at https://pip.worldbank.org/api. I think people would also benefit if this handbook was linked from the API page, because currently the API page doesn't have an documentation about the responses.

This is perfectly fine to ask questions here. I like your idea about linking to the methodology book from API documentation page. We'll see if this is technically feasible.

I hope your finding the site and API useful. Thanks again for your feedback!

areckenrode commented 1 year ago

Following up on a similar topic, I found the indicators table (https://api.worldbank.org/pip/v1/aux?table=indicators) to contain more useful information than the dictionary; however, this file also lacked content related to reporting_gdp and reporting_pce.

tonyfujs commented 1 year ago

Thanks @areckenrode This is very helpful, and we can probably use a bit of consolidation between the indicators and dictionary table.

As for PCE, it indeed stands for Personal Consumption Expenditures. This section of the methodology handbook should answer your question: https://datanalytics.worldbank.org/PIP-Methodology/welfareaggregate.html#incomeorconsumption

Please feel free to reach out if you need more information.