Development Research in Practice: The DIME Analytics Data Handbook. By Kristoffer Bjärkefur, Luíza Cardoso de Andrade, Benjamin Daniels, and Maria Jones
@kbjarkefur 8
Are yes or no questions consistently coded as 0 and 1, or 1 and 2?
@bbdaniels
What about "binary" questions rather than yes/no? (Thinking also of gender-type stuff where I've seen very basic mistakes get made)
@luizaandrade
Ok, I see that this is not clear. But not sure how to address it. It doesn't really matter in this example if they are consistently coded, or if they are categorical or binary. It just matters that you're mindful of the underlying value. Will try to rephrase it.
@luizaandrade
Suggested change
Are yes or no questions coded as 0 and 1, or 1 and 2?
In Stata, categorical variables have underlying numerical values, so it is important to be mindful of what they are when using this type of variable: a variable with "Yes" and "No" options, for example, may be coded as 0 and 1, or just as well as 1 and 2.
@kbjarkefur
While possible to code differently, isn't very error prone? Should we recommend that all yes/no is coded into 1/0 so they numerically can be used as frequencies in means and dummies in regressions?
@luizaandrade
Yes, you're right. We should.
@bbdaniels
Yes we should but we should leave this part as is, so something like:
Are yes/no questions coded as 0 and 1, or 1 and 2? Make sure they are recoded to 0 and 1. Similarly, you should typically generate dummies for each category in multiple-response variables (such as gender).
https://github.com/worldbank/d4di/blob/afc7c0d49da7e9a6c641602904986a0383dd45b2/chapters/data-analysis.tex#L344-L350
@kbjarkefur 8 Are yes or no questions consistently coded as 0 and 1, or 1 and 2?
@bbdaniels What about "binary" questions rather than yes/no? (Thinking also of gender-type stuff where I've seen very basic mistakes get made)
@luizaandrade Ok, I see that this is not clear. But not sure how to address it. It doesn't really matter in this example if they are consistently coded, or if they are categorical or binary. It just matters that you're mindful of the underlying value. Will try to rephrase it.
@luizaandrade Suggested change Are yes or no questions coded as 0 and 1, or 1 and 2? In Stata, categorical variables have underlying numerical values, so it is important to be mindful of what they are when using this type of variable: a variable with "Yes" and "No" options, for example, may be coded as 0 and 1, or just as well as 1 and 2.
@kbjarkefur While possible to code differently, isn't very error prone? Should we recommend that all yes/no is coded into 1/0 so they numerically can be used as frequencies in means and dummies in regressions?
@luizaandrade Yes, you're right. We should.
@bbdaniels Yes we should but we should leave this part as is, so something like:
Lets discuss and agree what we are recommending