Open evanmiller opened 3 years ago
Thanks for commenting, I appreciate you spending some time with the data.
1) is something I actually went back and forth about. I'm aware of the measurement error, but, in general, I'd rather not throw away observations on the basis of a variable that isn't directly of interest if I don't have to. I'd looked at the model both ways, including v excluding the population = 0 rows, and it doesn't make a big difference in the inference, so I kept it.
2) I'd also considered 2), but I don't think it makes sense here for a couple reasons. More than anything, the research papers I replicated mention that population affects the number of killings in other ways than just being an "opportunity," which turned me off of the idea. Lawson, for example, lists a couple ways that population acts on the psychology of officers. It just doesn't seem like we can assume a fixed coefficient of 1.
In general, when I'm replicating a study like this, I'd like to stay as close to the work as possible, unless they're making a clear error, which doesn't seem to be the case here.
Thanks for the fast reply. I think econometrically you'd want to set unknown populations to the mean population (rather than 0) so that the measurement error would be centered around zero. Regarding (2) you could include population both as an offset and as a separate coefficient to account for the officer psychology. But likely some of the other variables would then need to be re-specified to work as per-capita figures.
A couple more points, which you can take or leave:
I would think "year" should be included as a fixed effect rather than a linear regressor - but when I tried it I failed to get convergence.
Since military equipment are durable goods, it might make sense to use cumulative spending (possibly depreciated) as the regressor rather than annual spending.
In any event it is clear that you have put a ton of work into this project and I'm very glad you've made the sources and methods publicly available! I will take a look at your references to get a better sense of the previous work in this area.
Not sure if this is the right place, but I wanted to offer a few brief comments on the 1033 data and analysis in this repo:
I see that population=0 in many of the data rows. This looks like non-normal measurement error that may wreck the regression estimates.
Population would be a good candidate for an exposure variable (i.e. estimate killings per population). This can be included as such using
+ offset(log(population))
in the glm specification.Addressing 1 and 2 (i.e. dropping zero population observations and using an offset variable), I get an estimated effect about 1/3 the size of that reported. I also see that the
med_inc
turns negative (and becomes statistically insignificant), which resolves the unintuitive result that was reported.Model:
Estimates:
Happy to discuss in more detail (or not!).