Peer review - Githubissues

Peer review by: Banana Boat
Names of team members that participated in this review: Cooper Likosar, Grant Williams, Pascal Bell, Arthur Herman, and Jonathan Nguyen
Describe the goal of the project. This team is trying to analyze the factors the cause heart diseases and their magnitude. We think that this research question is strong, but can be strengthened by specifying the population that this question addresses.
Describe the data used or collected, if any. If the proposal does not include the use of a specific dataset, comment on whether the project would be strengthened by the inclusion of a dataset. The data set is legitimate as it has been used by previous research at the Hungarian Institute of Cardiology and other credible institutions. All of the relevant variables are included in the data set such as age, cholesterol levels, and so on. This data set is a great starting point for this study as it is narrowed enough: only including four cities in the United States.
Describe the approaches, tools, and methods that will be used.
1. Mutating new variables in the data set.
2. Using bar plots/histograms for data visualizations.
3. Using linear regression and R-squared values to depict relationships of variables of interest.
Provide constructive feedback on how the team might be able to improve their project. Make sure your feedback includes at least one comment on the statistical reasoning aspect of the project, but do feel free to comment on aspects beyond the reasoning as well. Very clear description of the methodology that specifically separates different variables that might cause heart diseases such as age; this purpose is effectively achieved by the use of histograms and bar plots to show the proportion of heart diseases among different genders or cholesterol levels.

Although the linear regression has done a good job of showing the strength of different relationships that might cause heart diseases, it might be a good idea to play around with these interaction models to find insightful details about this relationship by adding more variables in the linear regression model. It would be also great to have a more comprehensive conclusion by walking the audience through the methodology and different insights that the team might have found.

What aspect of this project are you most interested in and would like to see highlighted in the presentation.

We would like to see multiple interactions models to see how different variables, when combined together, can exponentially increase the chance of having heart diseases.

Were you able to reproduce the project by clicking on Render Website once you cloned it? Were there any issues with reproducibility?

Yes, we were able to render the website and there were no issues with reproducibility.

Provide constructive feedback on any issues with file and/or code organization.

There are some spots with formatting and grammatical issues, but overall, the code organization looks great!

What have you learned from this team's project that you are considering implementing in your own project?

We would also love to consider adding more variables in our linear regression and playing around with different options for our analysis.

(Optional) Any further comments or feedback?

GREAT JOB ON YOUR WORK! We are excited to watch your presentation.

Peer review by: r-s2dio

Names of team members that participated in this review: Angelie Quimbo, Kaitlyn Maher, Jack Roberts, Lukas Sanchez

Describe the goal of the project: The goal of this project is to determine how heart disease presence and magnitude varies based on age, sex, cholesterol, blood sugar, and blood pressure, and what variables best predict an individual's likelihood of getting heart disease. The project specifically aims to test the hypotheses that, 1) the older an individual is, the higher their cholesterol, blood sugar, and blood pressure, and thus, the greater their likelihood of developing significant heart disease and 2) age, sex, cholesterol, blood sugar, blood pressure, and the number of major vessels are the best predictors of whether or not an individual will develop heart disease.

Describe the data used or collected, if any. If the proposal does not include the use of a specific dataset, comment on whether the project would be strengthened by the inclusion of a dataset. The data is from the Hungarian Institute of Cardiology, and the Cleveland Clinic Foundation. Given the data includes all relevant variables (age, sex, blood sugar, blood pressure, etc.). Given the team has narrowed their research to patients in Cleveland, Hungary, Switzerland, and VA Long Beach, we think this is a manageable amount of data to work with and does not necessitate the inclusion of another dataset.

Describe the approaches, tools, and methods that will be used. Methods used include

Glimpsing the dataset
Mutating variables in the dataset
Visualizing the relationship between variables (gender, age, cholesterol, fasting blood sugar, blood pressure) and heart disease presence using ggplot (bar chart and histogram)
Filtering out entries with chol = 0
Using linear regression and R-squared values to compare relationships of chosen variables.

Provide constructive feedback on how the team might be able to improve their project. Make sure your feedback includes at least one comment on the statistical reasoning aspect of the project, but do feel free to comment on aspects beyond the reasoning as well.

Recommend larger bin widths for some of the charts (especially the ones that have blanks in the graphs, or have extremes (all yes or no heart disease). The graphs are a bit difficult to read, especially the "heart disease rates by blood pressure for each gender" graph.
We noticed the low values for r-squared showed that there are not many predictions for any of the predictors present, but we still think not finding anything or a high correlation can still be constructive and useful for the data science world.
Add units to graphs and narrative

What aspect of this project are you most interested in and would like to see highlighted in the presentation.

We are really interested in the results of this presentation but would like greater clarity in graphs and clarity/consistency in variable descriptions in the narrative (they are currently abbreviated at times and not at other times, which can be unclear).

Were you able to reproduce the project by clicking on Render Website once you cloned it? Were there any issues with reproducibility?

Yes, we were able to Render the project. No issues with reproducibility.

Provide constructive feedback on any issues with file and/or code organization.

The code is fairly organized and the files open well. No major issues here.

What have you learned from this team’s project that you are considering implementing in your own project? Our research questions don't overlap much but we'd like to implement

Fitness of models
Potentially a proportion graph/chart

(Optional) Any further comments or feedback?

Great job y'all!

sta199-s23-2 / project-sec-7-team-7

Peer review #6