worldbank / dime-data-handbook

Development Research in Practice: The DIME Analytics Data Handbook. By Kristoffer Bjärkefur, Luíza Cardoso de Andrade, Benjamin Daniels, and Maria Jones
https://worldbank.github.io/dime-data-handbook/
Other
63 stars 26 forks source link

Review ch5 #544

Closed luizaandrade closed 3 years ago

kbjarkefur commented 3 years ago

I like the text you added that no matter if it is survey data or admin data "the key aspects to have in mind are data completeness, consistency and distribution". However, I agree with Jim a bit that it seems as we are suggesting only survey quality methods to both survey and admin data. We seem to refer both survey and admin data when we say "Data quality assurance requires a combination of real-time data checks and back-checks or validation audits" but I think those are just survey data. But I also agree with you that we say what we need to say, but I think we should restructure it.

I think we should start by saying that for both "the key aspects to have in mind are data completeness, consistency and distribution" and then say that for survey data you have additional tools as you do the checks as data is collected, those tools are "real-time data checks and back-checks or validation audits". Then we talk about those survey specific tools. Then we end on talking on tabulations and such (which we already do) that answers "data completeness, consistency and distribution" and this applies to both. Does that make sense? It think it comes down to that we need to be clear what applies to both and what does only apply to survey data, like HFCs.

luizaandrade commented 3 years ago

I think real-time quality checks apply to both. But I agree that back-checks do not. Could we say something like "in the case of survey data, real-time data quality checks include back-checks and validation audits"?

luizaandrade commented 3 years ago
kbjarkefur commented 3 years ago

I will wait for the last items until I do my final review. I made a comment in https://github.com/worldbank/dime-data-handbook/pull/550 on whether it makes sense to split up data quality checks and between two chapters. I do not think it does.

In addition to the real time aspect, I think there are survey specific checks and those quality checks are focused on the huge source of error that comes from humans recording the answers. I think that is what back checks, validation audits and much of HFCs are focusing on. Survey data quality checks should always be done in real time when possible. An example when it is not possible is if we receive survey data collected where the field activities are already concluded.

Then when we have secondary data that is collected in real time, for example CDR data and remote sensing. I think the quality checks there are similar to quality checks done for non-real time secondary data, the only difference is that we do those checks frequently in real time. So I do not think CDR data has more in common with survey data than any other secondary data, despite it is is received in real time.

Let me know if you disagree.

luizaandrade commented 3 years ago

I disagree. I think most of the survey-specific content in chapter 5 is about things that should be thought of before the data is acquired, and alongside survey planning. The changes in 87d131a reflect the content that I think should be moved to ch 4. I also added more language on this in c4c24db as I wrote this comment to give examples of both survye and other data sources.

mariaruth commented 3 years ago

I disagree. I think most of the survey-specific content in chapter 5 is about things that should be thought of before the data is acquired, and alongside survey planning. The changes in 87d131a reflect the content that I think should be moved to ch 4. I also added more language on this in c4c24db as I wrote this comment to give examples of both survye and other data sources.

Luiza, I agree. I saw the two code chunks you highlighted and will work on integrating them in ch 4.

kbjarkefur commented 3 years ago

Ok, I am happy to with what you are suggesting!