worldbank / dime-data-handbook

Development Research in Practice: The DIME Analytics Data Handbook. By Kristoffer Bjärkefur, Luíza Cardoso de Andrade, Benjamin Daniels, and Maria Jones
https://worldbank.github.io/dime-data-handbook/
Other
63 stars 26 forks source link

ch 6 - recommendation for categories with two values - 1/2 or 0/1 #345

Closed kbjarkefur closed 4 years ago

kbjarkefur commented 4 years ago

https://github.com/worldbank/d4di/blob/afc7c0d49da7e9a6c641602904986a0383dd45b2/chapters/data-analysis.tex#L344-L350


@kbjarkefur 8 Are yes or no questions consistently coded as 0 and 1, or 1 and 2?

@bbdaniels What about "binary" questions rather than yes/no? (Thinking also of gender-type stuff where I've seen very basic mistakes get made)

@luizaandrade Ok, I see that this is not clear. But not sure how to address it. It doesn't really matter in this example if they are consistently coded, or if they are categorical or binary. It just matters that you're mindful of the underlying value. Will try to rephrase it.

@luizaandrade Suggested change Are yes or no questions coded as 0 and 1, or 1 and 2? In Stata, categorical variables have underlying numerical values, so it is important to be mindful of what they are when using this type of variable: a variable with "Yes" and "No" options, for example, may be coded as 0 and 1, or just as well as 1 and 2.

@kbjarkefur While possible to code differently, isn't very error prone? Should we recommend that all yes/no is coded into 1/0 so they numerically can be used as frequencies in means and dummies in regressions?

@luizaandrade Yes, you're right. We should.

@bbdaniels Yes we should but we should leave this part as is, so something like:

Are yes/no questions coded as 0 and 1, or 1 and 2? Make sure they are recoded to 0 and 1. Similarly, you should typically generate dummies for each category in multiple-response variables (such as gender).


Lets discuss and agree what we are recommending

bbdaniels commented 4 years ago

Resolved IMO

kbjarkefur commented 4 years ago

Agreed, i think it was resolved well in 86e10a738410f6d7ae007c9519c0e847a5e857c4 around line 450. I vote for closing.