sta199-s23-2 / project-statisfactory

https://sta199-s23-2.github.io/project-statisfactory/
0 stars 1 forks source link

Peer review - Team 6 #8

Open snr35 opened 1 year ago

snr35 commented 1 year ago
sgnanavel23 commented 1 year ago

**- Peer review by: Stat4

Because police brutality and killings have become a national scale issue, the goal of the project is to find data and see trends to observe if BIPOC Americans are more at risk to be killed by the police than white Americans.

Data was sourced from the Washington Post repository on fatal police shootings between 2015-2020, which is dependent on curated news reports and thus may exclude necessary data such as gender and minority status. The data was originally collected by manually combing through local news reports; combining information from law enforcement websites, social media, and other databases (including Fatal Encounters and the "Killed by Police" project). Data collection started in 2015 spurred by a slew of fatal shootings, and the information was updated in 2022.

In order to visualize the discrimination in accounts of killings between BIPOC and white Americans, they decided to focus on 4 variables: age, shot, shot and tasered, and race. They wanted to visualize these 4 variables with 6 graphs.

In the first graph, the team wanted to utilize a box plot to visualize age and those shot/ shot and tasered. The goal of this visualization is to understand the role that age plays and if certain age groups are more vulnerable to violence.

The next two graphs focused on age, gender, and shot/ shot and tasered variables. One histogram is faceted by race, filled by age, and filters those who were shot. The second histogram is the same except it is filtered by those who were shot and filtered. These two graphs build upon the first graph and introduce the role race and age play in the killings.

The fourth and fifth graphs compare age, gender, and shot/ shot and tasered. The first graph focuses on females, while the second graph focuses on males. For the female graph, they put an age on the x-axis and used logistic regression to compare shot (0) vs shot and tasered (1). For the male graph, they also put an age on the x-axis and used logistic regression to compare shot (0) vs shot and tasered (1).

The sixth graph focuses on if the victim was fleeing or not, and if they fled, how they did this. The team made use of a histogram and put three qualitative scenarios for the x-axis (did not flee, fled by car, fled on foot) and filled it with the shot/ shot and tasered variable. They use the tools of multiple linear regression analysis to estimate the importance and relevance of the extra explanatory variable (fleeing) and the response variable (level of violence). The purpose of this graph was to help account for fleeing status as a potentially confounding variable.

For graph 1, there could be another way to represent the data between age, shot, and shot and tasered. The box plot seems harder to read and find the exact median age of people shot. This could be fixed by instead representing the data with a histogram/bar graph of age faceted by “shot” and “shot and tasered”. This would allow better readability in what the median age would be for both categories.

For graphs 4 and 5, the line of best fit seems to not add anything to the overall sense of understanding the difference between age and manner of violence. If you do choose to include the line of best fit, there should be a second one for the “Shot&Tasered” category, as right now there is not any. We feel that a histogram would better represent the data

The fifth graph can be improved visually by having the labels (“Shot&Tasered” and “Shot) either above or below the data points so that it is more clear to understand and comprehend.

There is a statistical concern with this project that the team did not directly address. While the total dataset has a sufficient number of data points, a potential confounding variable could be the number of data points within each “fleeing category,” which could explain why there are so many more people shot (much more visually apparent) for “not fleeing” than other forms of fleeing. For example, if the dataset had 75% “not fleeing” data points and only 3% “fled by foot,” then the sixth graph may be misleading. We recommend adding this information and highlighting this potentially confounding variable.

We are most interested in new visualizations that this team can make as this is a comprehensive data set. We are also interested in understanding whether fleeing manner would affect the shooting manner and there is one graph so far, but we would like to see more!

There were no issues with reproducibility. The project was able to render without any issues. The teammates that cloned and rendered (1-2) were Sathvika and Sreya.

We feel that including the reasoning for the graphs or at least a few sentences describing what the graph is about or what quick deductions we can make from the graph would help instead of scrolling to the bottom to view. Overall, the introduction was clear, however, it would help to have your research question formatted clearly in the beginning so it can be clear for anyone who is quickly glancing at your file to understand the gist of your project.

We learned from this team’s project the value of conducting visualizations between multiple variables. We are considering implementing more trend analysis, and graphs, focusing on more variables in our dataset than just “UN Region” and “Income Level,” in order to provide a more holistic and in-depth review.