Update #1 - Githubissues

anayak23 commented 3 years ago

Do you plan for your final project to be an extension of the mid-semester project? a. If Yes: Identify specific ideas for how you will extend your mid-semester project. The more details the better here. Do you plan to add additional data? Be sure to include which topic(s) you will incorporate: text analysis, network science, unsupervised learning, and/or spatial data.

We plan to update and expand upon our Shiny app related to health education mandates and outcomes. We plan on adding additional data surveyed by the CDC about the risky behaviors of students. For example, how many students are tested for HIV and STI, and how many students used birth control during their last sexual encounter? This additional data could be incorporated with our original data and be visually represented similar to how we are currently using health outcomes. We can further investigate how state education mandates correlated with personal behavior individuals take to ensure their health and safety. We are going to explore using unsupervised learning (k means(?)) to test a hypothesis about cultural differences in the types of health education mandates. We might hypothesize that certain groupings will exhibit similar types of mandates; these groupings might be the states’ geographic region or their political climate. We plan to polish the data visualizations from the shiny app. In particular, we want to work with the graphs to get the hover functionality working so that it just shows the name of the states. We also may work on generally improving graphical quality.

Describe what you hope to deliver as a final product. Will your blog link to a published Shiny application? Will it incorporate an interactive map? Will it involve a predictive model that forecasts future values of some quantity using data that you’ve integrated?

The overarching goal of the blog is to inform readers about the state of sexual education and related outcomes in the United States and make a case for the need for stronger curricular requirements on the topic. In doing so we will be conscious of the limitations of our data and thus of the conclusions we can draw from it, and do our best to inform users appropriately We plan to add a new tab that would be the landing page for the website where users can learn about the project, questions, and the datasets we used. We plan to transfer the interactive visualizations created in our Shiny app into corresponding tabs in our blog post, with clear explanations for the logic of those visualizations. An added feature of our blog post, not present in our Shiny app, will be the hypothesis testing using unsupervised learning to investigate mandate patterns. We’re not sure exactly what this will entail but plan to refine this idea as we better understand unsupervised learning and the types of questions it can answer.

Outline a schedule for your group’s progress that will take you from now (ideas phase) to the final blog post and presentation at the end of the semester. During the last project, we had specific checkpoints for different phases of the project. Based on what you envision for your final blog post, identify checkpoints for your group and dates by which you plan to reach those checkpoints. Hold each other accountable, so you’re not waiting until the last minute to do things! In particular, you should have at least one checkpoint each week (ideally two) identifying what work you expect to complete by then.

Scrape and incorporate new data related to risky behaviors (Complete by 11/1) Exploratory Data visualization and narrowing down topic/goals (11/1) Address feedback to initial Shiny App (Fixing hover, commenting code to reflect who did what, line errors) (11/1) Put the current code into Blog Format (11/7) Unsupervised Learning (11/14) Writing out some text to accompany visualizations in Blog & Presentations (11/16)

Untitled (1) 2.pdf

laurenpelosi commented 3 years ago

This week, we have made progress towards understanding the challenges of scraping the particular data we need for our blog post and solving the problems that came up in our Shiny app code. However, we found this week taxing and didn’t accomplish as much as we hoped; we originally planned to have fully addressed the comments about our original Shiny app, finished scraping new data sets, and refined our research questions by the time of this update. Because of this, we’ve laid out a plan for next week to work together and generate momentum toward our goals.

While we have not been able to implement a solution to the hover functionality of our bar plot that allows users to see just the state name, we have an idea about the tactic we need for this solution (using a plotly plot rather than ggplot). The challenge has been integrating a plotly plot in a Shiny app, as few online sources show this method. We will continue to work toward this end, and plan to attend Monday’s Office Hours if we cannot solve it this weekend.

To add to the scope of our inquiry, we have determined that we will incorporate the CDC’s 2019 Youth Risk Behavior Surveillance System (YRBSS), which tracks sexual behaviors related to unintended pregnancy and sexually transmitted diseases, alcohol and drug use, dietary behaviors, and physical activity among young people in every state. We planned to have this data scraped this week; however, we realized scraping all this information entails downloading each individual data table (for each risky behavior of interest) as an Excel file, converting it to a csv, and joining the tables by state in our data wrangling file, and this will take longer than originally anticipated.

We found that it was hard for us to stay focused on these projects with midterms and the election this week. To help us stay on track with our schedule in future weeks, we decided it would be best for us to meet over Zoom more often for updates and working sessions. Our next meeting will be 10AM EST tomorrow for us to catch up on the deadlines we set for ourselves from this week and plan how we’ll divide our tasks for the weekend and next week.

katcorr commented 3 years ago

Thorough update. It was an ambitious plan for a nerve-wracking week. I think your plan to meet more often to catch up on the deadlines makes sense. If you think you'll need to scale back your overall plans given your progress, please include that in the next update.

Update 2: 5/5

laurenpelosi commented 3 years ago

We’re continuing to make scraping progress. We have the code to scrape the risky behaviors dataset, and now just have to manually download each Excel file. We have also scraped text files from the CDC website related to teacher behavior in every state (ie. percentage of teachers that teach HIV prevention, for example). We now have to pull out the numbers we need, which are embedded in the text.

We’ve decided we’ll incorporate this new data into two new plots. One will be a scatterplot with the percentage of teachers that try to increase knowledge on _ on the x-axis, and the related health outcome on the y-axis, colored by mandate. Time permitting, we will also create a bar plot with the state on the x-axis, the percentage of teachers that try to increase knowledge on on the y-axis, arranged by y and colored by mandate. (We will also create visualizations for our k-means clustering, as originally planned.)

The scraping process has been more time-intensive than we expected, but we are nearing the end and should be on track to achieve our original goals.

katcorr commented 3 years ago

Sounds good!

Update 3: 5/5

stat231-f20 / Blog-library-Fauci

Update #1 #1