nealjean / predicting-poverty

Combining satellite imagery and machine learning to predict poverty
http://sustain.stanford.edu/predicting-poverty
MIT License
452 stars 232 forks source link

Object "pcexp_dr_w2" not found #24

Closed thegcamilo closed 5 years ago

thegcamilo commented 5 years ago

Hello,

Ive trained running the ProcessSurveyData.R script but I keep getting the following error:

Error in data.frame(hhid = hhid, cons = pcexp_dr_w2/365) : object 'pcexp_dr_w2' not found

The error comes from the following part of the code:

nga13.cons <- read.dta('./data/input/LSMS/DATA/cons_agg_wave2_visit2.dta') %$% data.frame(hhid = hhid, cons = pcexp_dr_w2/365) nga13.cons$cons <- nga13.cons$cons*110.84/(79.53*100) nga13.geo <- read.dta('./data/input/LSMS/DATA/Geodata Wave 2/NGA_HouseholdGeovars_Y2.dta')

How can I fix the code?

nealjean commented 5 years ago

We are no longer actively supporting this repo but it's possible that @wmadavis may have ideas... Any idea what the issue could be?

hans-ekbrand commented 5 years ago

Note that ./data/input/LSMS/DATA/cons_agg_wave2_visit2.dta is not the file used in ProcessSurveyData.R the upstream source (this repo), the original https://github.com/nealjean/predicting-poverty/blob/master/scripts/ProcessSurveyData.R has this:

## Nigeria ##
nga13.cons <- read.dta('data/input/LSMS/DATA/cons_agg_w2.dta') %$%
data.frame(hhid = hhid, cons = pcexp_dr_w2/365)

However, this file: cons_agg_w2.dta isn't part of the zip-archive for LSMS Nigeria provided at http://microdata.worldbank.org/index.php/catalog/1952/get_microdata any more. So, the problem is that the World Bank has changed the contents of the data. As a work-around I've commented out all references to LSMS Nigeria in the scripts, but it would be great if either the World Bank could update their archive to include that file again, or if it could be made available by other means.

Kind regards,

Hans Ekbrand, University of Gothenburg, Sweden.

nealjean commented 5 years ago

Thanks for the info Hans!

wmadavis commented 5 years ago

Hi all,

Sorry for getting to this late. Hans' explanation is correct; I've had to re-write that script before in response to the World Bank updating the same Nigerian dataset post-publication without warning (can be seen in an edit to the ReadMe) and it appears they've done it again.

Gabriel, I've just downloaded the dataset in its current form from Hans' link and would tentatively recommend doing the same. Note that the folder of Nigerian data will no longer be named DATA so you'd have to replace every instance of /data/input/LSMS/DATA/ with /data/input/LSMS/NGA_2012_GHSP-W2_v02_M_STATA/. Then I would recommend making the following adjustment to the part of the code that you've highlighted:

nga13.cons <- read.dta('/data/input/LSMS/NGA_2012_GHSP-W2_v02_M_STATA/cons_agg_wave2_visit2.dta') %$%
    data.frame(hhid = hhid, cons = totcons/365)
nga13.cons$cons <- nga13.cons$cons*110.84/(79.53*100)

Let me know if this resolves the issue.

Best,

Matt

thegcamilo commented 5 years ago

Thank you. It solved it!