uchicago-computation-workshop / 2018_spring_conference

0 stars 1 forks source link

Chelsea Ernhofer -- Broken Windows and Teen Birth Rates #7

Open cernhofer opened 6 years ago

cernhofer commented 6 years ago

🙌

jfan3 commented 6 years ago

Do you link your multiple data sources down to the individual level?

Thanks!

bensoltoff commented 6 years ago
rickecon commented 6 years ago

I like your project. Good use of linear regression, decision trees, and random forests.

jamesallenevans commented 6 years ago

Physical environment sends direct message / ecological effect Linking data on demographic variables and perceived safety/disorder on teen pregnancy At zip code level (why so high...why not Census block? (with ACS?) Linear results presented Why not over all urban areas in all cities...ACS linked with data on perceived safety Super interesting / provocative association.

bethbailey commented 6 years ago

How do you control for the fact that built environments with a lot of disorder could have high correlations with other theorized causes for high teen birth rates?

Alicechung commented 6 years ago

I like the dataset you used, and applied deep learning and validate your result with different validation methods. It would be great if you can show the results with graphs :)

w4rner commented 6 years ago

Nice pres. Possible extension: how do rural physical environments compare to urban ones (both high & low disorder)?

xiuyuanzhang commented 6 years ago

This is really cool! Can you explain further how the social and ecological aspects are distinguished? It seems that the crime rate var belongs to the social aspect but included in your model?

rodrigovaldes commented 6 years ago

Beautiful presentation! I believe you need to convince the audience that you do not have a problem of omitted variables (everything correlated with your index of order).

jmausolf commented 6 years ago

Great, excited to see you using both regression and ML models. Thought with the regression relative to the outcome, if the outcomes are ordered survey response items, an ordered logistic regression or multinomial logit may be better depending on the dispersion of the DV.

ruixue-li commented 6 years ago

Have you tried doing some cross validation?

cernhofer commented 6 years ago

@jfan3 I WISH! Individual level data connected to geographical context is hard/impossible to come by. I actually spent most of my time last quarter trying to find individual level data that I could add to the model. Thanks for the comment!

cernhofer commented 6 years ago

@bensoltoff Thanks for the comment!

cernhofer commented 6 years ago

@jamesallenevans thanks for the comment!

Basically the answer to everything is data constraints- the perinatal (teen birth) data I have is at the zip code level and the StreetScore data is available only for New York City and Boston- it's my next step to expand to this second city.

cernhofer commented 6 years ago

@bethbailey- Basically just by including somer of those alternative approaches as controls in the model. I think that there's a large weakness in my analysis in that I have no way of looking at individual/family level variance which I think could have large influences. However, the high R-squared and significance of neighbourhood order in the linear model gives me some confidence, at least, that the work isn't pure garbage.

Thanks for the comment!

cernhofer commented 6 years ago

@Alicechung Great comment- thanks! You're totally right!

cernhofer commented 6 years ago

@lpwarner Thanks for the comment- that would be super interesting. Unfortunately, the specific way I'm operationalizing neighbourhood order/disorder (StreetScore) has only been applied to a few select urban areas in the States. If they ever make their algorithm public I would do that in an instant!

cernhofer commented 6 years ago

@xiuyuanzhang Thanks for the comment- I included crime mostly as a control for at least some form of social influence but you're correct in that it definitely creates confusion when I describe my model as purely ecological.

cernhofer commented 6 years ago

@rodrigovaldes Thanks!

cernhofer commented 6 years ago

@jmausolf I don't know if I understand your question. My response variable is a quantitative rate of teen birth. I'm not quite sure if there's an intuitive method whereby I could divide this variable into two groups in order to warrant a logistic regression.

cernhofer commented 6 years ago

@ruixue-li Yes! I didn't talk about it due to time constraints but I was able to do cross validation for the models and results were fairly consistent. Thanks for the question!

RuxinChen commented 6 years ago

While various neighborhood demographics (esp. disadvantage indicators) are controlled for, I am wondering if you have considered the effect of neighborhood gender and age compositions on teens birth rate. Presumably, a neighborhood consists of a higher proportion of teen males and younger teens might have a lower teen birth rate than one with more females and a more mature teen population. It would, therefore, be important to account for gender and age compositions at the neighborhood level.

dpzhang commented 6 years ago

What is your standardization methods so as to make each zip code area consistent?