Update 1: PUG Blog Final Project

Our final project will be an extension of out mid-semester project.

We are planning on adding more variables to our existing dataset, as well as examining more years of the census data with the hopes of making predictive inference. We will use spatial data to produce our visualizations - specifically, we hope to use leaflet to create a more interactive and accessible application. Because we are continuing to work on city information throughout the United States, it was only natural to create a map visualization for our dataset. In short, we will add more cities, more information on said cities, and present this data over time. We will make our blog/application more accessible by creating map visualizations.

We hope to incorporate an interactive map into our final product. This will probably require a shiny app, although if we can make it work just using leaflet we might opt for that approach. We are imagining a map of the united states with a circle representing each city, and the size of each circle corresponding to that city’s value of a variable of the user’s choice. The user will be able to click on each city and see some of its statistics. We also want to expand our dataset to span over multiple years, and our interactive map will include a slider that lets the user choose which year they want to see, or maybe even an animation of the data through time. We might put together a few other graphics showing summary statistics or try making a model to predict cities’ values for a certain variable if the data lends itself to that, but mostly we want to focus our efforts on making one really great map.

We decided on three checkpoints/due dates before the final blog post and presentation are due. By next Tuesday (11/10), we are aiming to have pulled more data from the tidy census package. This new data will include more variables of interest and more years (not just 2018). By the following Tuesday (11/10), we hope to have working code for our map and any other coded components we decide to include in the blog. We will then spend the last week assembling the components of our blog, formatting the information, and writing the written components of the post. This section will be done by 11/17, in time for the start of presentations on that Tuesday. If we have time, we also hope to pre-record our presentation. As the project progresses, we will divide up responsibilities/tasks accordingly.

Due Date Summary
By 11/3: Finalize new dataset, pull new variables + more years
By 11/10: Have a draft of the map/necessary code of other components
By 11/17: Write everything up, format the blog, and prepare presentation

Steedman, Grace, Mike, Rodrigo

@glecates @rod144 @sjenkins23 @mjsantos6

Excellent ideas, team! I agree that expanding on your shiny project for this blog project makes sense, and I look forward to this extension. I also agree on focusing on one or two really great visualizations, and would recommend either:

A shiny app that allows the user to select different variables or time that would update the map; OR
Embed one or two main visualization(s) in the website (no shiny app), e.g. (a) a leaflet plot that allows the user to zoom in/out/around and when click on a particular city, additional info on that city is displayed; and/or (b) you mentioned animating a plot over time -- that could be really interesting -- there is a package gganimate that allows you to add animation to plots created with ggplot. There's an example here that uses gganimate to display gapminder data by continent over time. Maybe you could create a similar plot (for a select number of cities) displaying a particular variable over time?

"By next Tuesday (11/10), we are aiming to have pulled more data from the tidy census package. This new data will include more variables of interest and more years (not just 2018). By the following Tuesday (11/10), we hope to have working code for our map and any other coded components we decide to include in the blog. " I assume that first date was a typo -- by "next Tuesday" you had meant 11/3?

I'm really looking forward to your blog post!

Update 1: 10/10

We spent the week compiling data for our map. Although a little bit behind our original schedule, we have a working dataset (dataset.csv) that includes more variables and years from the tidy census package and latitude and longitude coordinates for each city. Below is an explanation about how we went about creating this dataset and our plan moving forward.

Variables: We had to go through each of the variables we used from tidycensus for our Shiny App to check and see if they corresponded with the same values for each of our desired years (2009 - 2018). We found that some of our variables were not available for each year, thus we filtered through the tidycensus acs data again and found additional variables that were available for all of our years. Below is the list of codes we ended up deciding to use:

population = "B01003_001" median_income = "B19326_001" f_born = "B05002_013" median_value = "B25077_001" married_hh = "B11001_003" total_hh = "B11001_001" houses_for_sale = "B25004_004" white_alone = "B02001_002" black_alone = "B02001_003" median_age_female = "B01002_003" median_age_male = "B01002_002"

Adding more variables is not difficult. If we have time, it will be an easy change to add more variables.

Years: Upon further exploration of the tidycensus package, we discovered the American Community Survey data was only available for 2009 - 2018. Thus, we decided to focus on these years for our dataset. To compile the data for all of these years, we used a for loop. The code for this loop can be found within the steedman_scrape_blog.R file in our scraping folder.

When compiling the dataset, we discovered that the metro area were named differently for 2009 - 2013 (ie. Chicago is labelled as Chicago-Naperville-Elgin in 2018 but as Chicago-Joliet-Naperville in 2012). The metro areas are named consistently 2013 - 2018, so for now, we decided to focus on these years instead of 2009 - 2018. If you have any suggestions for how to solve this issue, please let us know and we will go back and try to add in more years!

Latitude and Longitude: We joined our dataset with the mdsr WorldCities dataset used in PS8A to get the longitude and latitude coordinates for our US cities. We hardcoded the coordinates that were not in the WorldCities dataset.

Next Steps: The next step is to start working on the code for our map. PS 8B was a helpful review of spatial visualizations and we plan on implementing many of the techniques used in Lab 8 and PS 8B to create our map of US cities. As explained in Update 1, we are also planning on incorporating interactivity using leaflet.

As stated at the beginning of this update, we are a little bit behind our original schedule. The delay in our schedule is due to general end of the semester stress. Moving forward, without a PSA this weekend and our final PSB on Friday, it will hopefully be easier to prioritize our blog project.

Adjusting our checkpoints, we are planning on meeting this weekend to discuss our map code. Then, we are hoping to have a working map by Wednesday night/Thursday. This will leave us the weekend and early the following week to design/plan/write the content of the actual blog. As stated in Update 1, if we have time, we would also like to pre-record our presentation.

Thorough update, and good plan for moving forward!

Update 2: 5/5

Since Update 2, we have been working on building the substance of our blog post.

We met on Sunday to figure out a plan moving forward from Update 2. Slightly changing our original plan, we decided to focus on making two main blog components: A predictive model and an interactive map. To divide up the work, we decided to have Steedman and Mike focus on the code for the model and Grace and Rodrigo focus on the code for the interactive map. Both components are going to be within a shiny application.

Model: The predictive component of our blog post will be a shiny app where the response variable is the change in population from 2013-2018. The user will have the option to select a variable of interest and then see if the variable is a significant predictor of change in the city’s population from 2013-2018. The output of the shiny app will include a scatter plot with a line of best fit and R output that tells the user if the selected variable is a significant predictor.

At this point, we have working code for the predictive model and shiny application. May require minor updates and/or changes moving forward.

Leaflet Map: Our vision for the interactive map is to have a map where the user can select a year of interest and variable of interest. The size/color of the city markings on the map will correspond to the selected variable’s value for that particular city in that particular year. We would also like to have the name of the city and value of the variable to show up when the user clicks on each city. The map will act as a new way to visualize some of the regional relationships we discovered in our Midterm Project.

At this point, we have working code for the leaflet component of the map and just need to put the code into shiny app form.

Next Steps: We are on track with our plan to finish the coded components of our blog post Wednesday/Thursday. Having time to work on the project in class on Thursday will be helpful if we run into any issues. We will then spend the weekend building the written and visual components of the post in our index.rmd.

Great progress. Sounds good!

Update 3: 5/5

stat231-f20 / Blog-Data-For-Good

Update 1: PUG Blog Final Project #1