Open anayak23 opened 3 years ago
This week, we have made progress towards understanding the challenges of scraping the particular data we need for our blog post and solving the problems that came up in our Shiny app code. However, we found this week taxing and didn’t accomplish as much as we hoped; we originally planned to have fully addressed the comments about our original Shiny app, finished scraping new data sets, and refined our research questions by the time of this update. Because of this, we’ve laid out a plan for next week to work together and generate momentum toward our goals.
While we have not been able to implement a solution to the hover functionality of our bar plot that allows users to see just the state name, we have an idea about the tactic we need for this solution (using a plotly plot rather than ggplot). The challenge has been integrating a plotly plot in a Shiny app, as few online sources show this method. We will continue to work toward this end, and plan to attend Monday’s Office Hours if we cannot solve it this weekend.
To add to the scope of our inquiry, we have determined that we will incorporate the CDC’s 2019 Youth Risk Behavior Surveillance System (YRBSS), which tracks sexual behaviors related to unintended pregnancy and sexually transmitted diseases, alcohol and drug use, dietary behaviors, and physical activity among young people in every state. We planned to have this data scraped this week; however, we realized scraping all this information entails downloading each individual data table (for each risky behavior of interest) as an Excel file, converting it to a csv, and joining the tables by state in our data wrangling file, and this will take longer than originally anticipated.
We found that it was hard for us to stay focused on these projects with midterms and the election this week. To help us stay on track with our schedule in future weeks, we decided it would be best for us to meet over Zoom more often for updates and working sessions. Our next meeting will be 10AM EST tomorrow for us to catch up on the deadlines we set for ourselves from this week and plan how we’ll divide our tasks for the weekend and next week.
Thorough update. It was an ambitious plan for a nerve-wracking week. I think your plan to meet more often to catch up on the deadlines makes sense. If you think you'll need to scale back your overall plans given your progress, please include that in the next update.
Update 2: 5/5
We’re continuing to make scraping progress. We have the code to scrape the risky behaviors dataset, and now just have to manually download each Excel file. We have also scraped text files from the CDC website related to teacher behavior in every state (ie. percentage of teachers that teach HIV prevention, for example). We now have to pull out the numbers we need, which are embedded in the text.
We’ve decided we’ll incorporate this new data into two new plots. One will be a scatterplot with the percentage of teachers that try to increase knowledge on _ on the x-axis, and the related health outcome on the y-axis, colored by mandate. Time permitting, we will also create a bar plot with the state on the x-axis, the percentage of teachers that try to increase knowledge on on the y-axis, arranged by y and colored by mandate. (We will also create visualizations for our k-means clustering, as originally planned.)
The scraping process has been more time-intensive than we expected, but we are nearing the end and should be on track to achieve our original goals.
Sounds good!
Update 3: 5/5
We plan to update and expand upon our Shiny app related to health education mandates and outcomes. We plan on adding additional data surveyed by the CDC about the risky behaviors of students. For example, how many students are tested for HIV and STI, and how many students used birth control during their last sexual encounter? This additional data could be incorporated with our original data and be visually represented similar to how we are currently using health outcomes. We can further investigate how state education mandates correlated with personal behavior individuals take to ensure their health and safety. We are going to explore using unsupervised learning (k means(?)) to test a hypothesis about cultural differences in the types of health education mandates. We might hypothesize that certain groupings will exhibit similar types of mandates; these groupings might be the states’ geographic region or their political climate. We plan to polish the data visualizations from the shiny app. In particular, we want to work with the graphs to get the hover functionality working so that it just shows the name of the states. We also may work on generally improving graphical quality.
The overarching goal of the blog is to inform readers about the state of sexual education and related outcomes in the United States and make a case for the need for stronger curricular requirements on the topic. In doing so we will be conscious of the limitations of our data and thus of the conclusions we can draw from it, and do our best to inform users appropriately We plan to add a new tab that would be the landing page for the website where users can learn about the project, questions, and the datasets we used. We plan to transfer the interactive visualizations created in our Shiny app into corresponding tabs in our blog post, with clear explanations for the logic of those visualizations. An added feature of our blog post, not present in our Shiny app, will be the hypothesis testing using unsupervised learning to investigate mandate patterns. We’re not sure exactly what this will entail but plan to refine this idea as we better understand unsupervised learning and the types of questions it can answer.
Scrape and incorporate new data related to risky behaviors (Complete by 11/1) Exploratory Data visualization and narrowing down topic/goals (11/1) Address feedback to initial Shiny App (Fixing hover, commenting code to reflect who did what, line errors) (11/1) Put the current code into Blog Format (11/7) Unsupervised Learning (11/14) Writing out some text to accompany visualizations in Blog & Presentations (11/16)
Untitled (1) 2.pdf