stat231-s21 / Blog-Lebron-Warrior-Hawks

Repository for PUG Blog Project – Lebron Warrior Hawks
https://stat231-s21.github.io/Blog-Lebron-Warrior-Hawks/
0 stars 0 forks source link

Update 1: Our Plan #1

Open kjinsanity opened 3 years ago

kjinsanity commented 3 years ago

This PUG Blog project will build upon our mid-semester presentation. We will be scraping data of where the players are shooting each year to continue our investigation on the evolution of the play style of the league. This data is provided from the official NBA website. Furthermore, we will be exploring using the ballr statistical package in R that is useful for NBA analytics. The topics we hope to incorporate with this extension of our mid-semester project will be unsupervised learning and spatial data.

Our blog will tell the story of the NBA in long-term trends with statistical analysis from our first project combined with analysis of shot chart data. This shot chart data will include data about the types of shots the league was taking as a whole for different years. In addition, we will examine shot data for various players and teams. Our project will include a shiny app that will be embedded in the webpage. We will also use unsupervised learning to find pairs of players that have similar statistical profiles to each other or see the historical equivalence of current top players to previous hall of famers. With this analysis, we aim to discover surprising comparisons between different players.

By the first checkpoint (5/4), we hope to achieve all the necessary data collection and wrangling of the shotmap data and the scraping of career data of previous hall of farmers.

By the second checkpoint (5/11), we hope to have finished our shiny application and visualization so we can spend the last few days touching up on our presentation.

katcorr commented 3 years ago

@kjinsanity @Baraza10699 @kma7261

Cool find with the ballr package -- it looks like you could make some pretty cool visualizations, and it can be a great experience to explore working with a new package. One idea that could be fun, since you mentioned wanting to look at shot location over the years, is to use gganimate to create an animated graphic where the shot visualization (like one created using the ballr package) changes with each year. I'd be happy to help you implement the gganimate part, if you're interested (or, this could be too ambitious depending on how complex/challenging it is to work with the ballr package).

I would encourage you to have more of a detailed schedule plan for your group (they don't necessarily have to align with the class updates), e.g., are you planning to meet on a weekly or bi-weekly basis? or check in with each other through other means? will you each take the lead on writing certain parts of the blog, or work on the writing all together?

I'm really looking forward to your blog post!

Update 1: 10/10

kjinsanity commented 3 years ago

Update 1

We spent this week obtaining NBA shot data from the ballr package and the NBA Stats API. Due to the size of the dataset (0.5 GB and over 1900000 observations), running the code took a few hours, but we have a comprehensive .csv file of all the shots taken by each player for all the seasons from 2010 to 2021 (up to when NBA stats was most updated since May 3rd, 2021).

In addition, we have collected data for each player in every NBA season since the 1981-1982 season. This will allow us to make our unsupervised learning component where we can find the most similar historical season for each current player in 2021 (excluding the player’s previous years).

We will begin working on making the visualizations (gganimation of the shot map by year (potentially filtering out restricted area shots)) this weekend and dig deeper into unsupervised learning.

katcorr commented 3 years ago

@kjinsanity @Baraza10699 @kma7261

Great, glad to hear you're on track with the schedule you originally set! I received another email from GitHub last night:

Git LFS has been disabled on the organization stat231-s21 because you’ve exceeded your data plan by at least 150%. Please purchase additional data packs to cover your bandwidth and storage usage:

  https://github.com/organizations/stat231-s21/billing/data/upgrade

Current usage as of 05 May 2021 12:43AM UTC:

  Bandwidth: 1.51 GB / 1 GB (151%)   Storage: 0.5 GB / 1 GB (50%)

Teddy and Kevin M., were you able to pull the repo so that the large shots csv is on your computers? I'm not sure if you'll be able to now; I haven't received the above message before from GitHub, so I don't know exactly how this is affecting our course organization. But, let me know if you're having trouble moving forward because you can't access the csv file!

Update 2: 5/5

katcorr commented 3 years ago

@kjinsanity @Baraza10699 @kma7261

To clarify -- please let me know once you have pulled total_shot_data.csv to your local working space. Then, please move total_shot_data.csv to a different location (outside your Rstudio project) on your local machine. I will then delete total_shot_data.csv from the repo. (If you only have it in your Rstudio repo project, then when you pull the repo again, you could lose it -- which is why I ask that you copy it somewhere else on your machine.)

kjinsanity commented 3 years ago

Update 2 We spent the past week creating visualizations for our data. Using the shot map data set, we generated a heat map density plot of all the shots taken in the NBA seasons from 2010 to 2021. We are able to visually show the changes of the league’s play style, mainly shifting away from mid range shots and instead more three point shots. Additionally, we numerically showed how many more shots the players have been taking from the three point range (and by extension, showed how they have been taking less mid range jump shots).

Furthermore, we created a shiny app where the user can specify a player in the 2021 season and the app finds the most similar player-season statistically in the past 20 seasons. We calculated the euclidean distance of all stats (advanced and total) for each player season relative to the selected player’s 2020-2021 season. This is done after normalizing each statistic using the method described in class..

For the final steps, we are going to continue writing up our blog, and include the visualizations in the format that best fits the output (for example, the heat maps were saved as static images because they took too long to run, but we will show the code that generated the graphs). We might also add one more visualization.

katcorr commented 3 years ago

@kjinsanity @Baraza10699 @kma7261

sounds good!

update 3: 5/5