jonmgeiger / household-conditions

https://jonmgeiger.github.io/household-conditions
3 stars 0 forks source link

Preliminary Analysis #1

Closed jonmgeiger closed 2 years ago

jonmgeiger commented 2 years ago

Added a correlation plot, as well as some plots of certain variables against each other that have correlations of >0.3.

Let me know if you have any recommendations or think it should be pared down or expanded upon at all. I'll keep working on this throughout the weekend, and feel free to add whatever other analyses you want to.

jonmgeiger commented 2 years ago

Don't mind all the above commits, I was just being dumb.

noelgoodwin commented 2 years ago

This is all super helpful, Jon. I think we should consider filtering the data even further to only include school districts with more than 100-150 children. Less than that and most if not all of the entries for the other variables are 0 or have margins of error between 0 and 100 which will be problematic. After doing some research I am still not sure on how to approach the margin of error with it formatted how it is. Looking forward to getting some insight into that tomorrow. I added the correlation matrix and a few more context notes into the slideshow for this week.

jonmgeiger commented 2 years ago

Agreed on the filtering. Margins of error will definitely be an interesting question to ask. I think it'll be important to come up with a research question before we eliminate any data, but yes, it'll probably be the case that we get rid of all the data that just skews our model.

As for the rows whose margins of error go from 0%-100%, is that just attributed to the low student number, or might that be an indication of implicitly missing data?

jonmgeiger commented 2 years ago

@noelgoodwin and @joppaa, take a look over the updated HTML file for the analysis and let me know what you think. For plotting, I filtered out the districts with 100 students or fewer as per Noel's recommendation, and just overall improved the formatting.