stat231-s23 / blog2-TheBigThree

Lau ten Cate, Michael Peralas, Muhammad Ahsan Tahir
0 stars 0 forks source link

blog2-TheBigThree #1

Open Michaelrp525 opened 1 year ago

Michaelrp525 commented 1 year ago

Extension of the mid-semester project: Our final project will be an extension of our mid-semester project, where we gathered data about the teams in the Premier League. We will incorporate network science to create an interactive visualization that displays the relationships between Premier League teams based on their head-to-head records. We plan to add additional data by scraping websites such as Transfermarkt or utilizing APIs like the football-data.org API to gather head-to-head match data for each team in the Premier League. This data will be used to create a graph with nodes representing the Premier League teams, edges representing the difference in the number of wins between teams, and colors representing which team had the higher number of wins in their head-to-head record.

Final product: Our final product will be an interactive blog post that features a Shiny application. The Shiny app will allow users to select a specific Premier League team and generate a graph displaying the head-to-head relationships between the selected team and all other teams in the league. The graph will be visually appealing and informative, enabling users to better understand the dynamics between teams in the Premier League.

Schedule and checkpoints:

4/13: Submit Blog Plan (10 pts) 4/14-4/19: Gather and clean additional data (head-to-head match data) Checkpoint 1 (4/16): Complete data gathering Checkpoint 2 (4/19): Complete data cleaning 4/20: Status Update 1 (5 pts) 4/21-4/26: Develop the Shiny application Checkpoint 3 (4/23): Complete basic functionality (user input, graph generation) Checkpoint 4 (4/26): Polish the Shiny app (visual design, user experience) 4/27: Status Update 2 (5 pts) 4/28-5/3: Write the blog post, integrate the Shiny app, and prepare the presentation Checkpoint 5 (4/30): Complete the first draft of the blog post Checkpoint 6 (5/2): Integrate the Shiny app into the blog post Checkpoint 7 (5/3): Prepare the presentation 5/4: Presentation/feedback session (20 pts) 5/9: Submit Final Blog Post (70 pts) and Reflection II (10 pts)

katcorr commented 1 year ago

Sounds good! Nice details to your schedule and checkpoints. I look forward to the blog post!

Proposal: 10/10

lautencate commented 1 year ago

Update 4/20: We have made progress wrangling our data, and are currently working on cleaning our data so it becomes easier to turn into network science. We will work over the weekend to finalize our cleaning and begin to create graphs that show the head to head results of the premier league teams

katcorr commented 1 year ago

Ok! I don't see any of the code or datasets in this repo? please keep here in this repo

Update 1: 4/5

Michaelrp525 commented 1 year ago

Progress: Over the past few weeks, we've encountered some challenges with our project. The main issue we faced was finding the required data to create the network graphs. The previous dataset we had wasn't sufficient for the network graph we wanted to create. After an extensive search, we discovered a public database called Openfootball, which has the data we need. However, we needed to improve our SQL skills to interact with and use this database effectively. We gained a better understanding of SQL in today's class (4/27) and have successfully imported the Openfootball database.

Challenges and Schedule Adjustments: Due to the difficulties in finding the data and the unexpected need to learn SQL, we have fallen behind our initial schedule. We are now in the process of data wrangling, which we anticipate will be challenging. As a result, we need to adjust our checkpoints and come up with a plan to get back on track.

Plan to Get Back on Track: To get back on track, we will dedicate more time to the project by working more on weekends and allocating additional hours during the week.

katcorr commented 1 year ago

For wrangling, unless the size is prohibitive, it might make most sense to use collect() early on to create a regular R dataframe from the SQL tables and use the usual tidyverse functions to do most of the wrangling, since you're more familiar with those than SQL.

Update 2: 5/5