worldbank / dime-data-handbook

Development Research in Practice: The DIME Analytics Data Handbook. By Kristoffer Bjärkefur, Luíza Cardoso de Andrade, Benjamin Daniels, and Maria Jones
https://worldbank.github.io/dime-data-handbook/
Other
64 stars 26 forks source link

Data cleaning rewrite #450

Closed luizaandrade closed 4 years ago

luizaandrade commented 4 years ago

Ok, here it is. I'm sure there's still a lot of adjustments to be made, but a fresh set of eyes would help. I went back and forth with the order of sections, so that may be something to pay particular attention to

bbdaniels commented 4 years ago

Very well written overall and a great improvement. Almost no more content is needed but a little bit of structural cleanup is. The big need is to make the flow and logic a little more accessible, as in to clear up why exactly the workflow is:

  1. Tidy
  2. Clean, including HFCs
  3. De-ID

The info is there but the following points need emphasis:

So perhaps the "preparing data for analysis" can become an introduction section that summarizes all this right at the beginning? The "correcting data points" subsection there should be part of "data cleaning" in my opinion. Small notes attached in annotations.

d4di-ch5-bbd.pdf

luizaandrade commented 4 years ago

Cleaning: make the data look like the survey. That's what tidying does.

kbjarkefur commented 4 years ago

This is coming around really well! Almost all of my comments is very minor, and should be possible to address very quickly! Great work!