introdsci / DataScience-OliviaAbbott

DataScience-OliviaAbbott created by GitHub Classroom
0 stars 0 forks source link

Final Review #8

Open tanyabonilla opened 4 years ago

tanyabonilla commented 4 years ago

Summary

Here, place a 1-paragraph summary that outlines (1) what the project investigated, (2) what insights/conclusions they found, and (3) what is the next planned step.

The project investigated the changes in education over time with my project. The data used came from the USDA (United States Department of Agriculture) Economic Research Service and National Center for Education Statistics websites. She used both sets of data to predict “…how teaching salaries have changed over time.” She found that since the p-value was about .3614, the graduates data turned out to not be a very good predictor of the percent change in salary. She found that the next step would be to get more data and create more models. She concluded that although there is some correlation between student achievement and teacher’s salary, it does not mean that there is causation.

Data Preparation

Here, describe (1) what tables have been developed and what kind of information they hold; (2) answer: does the portfolio demonstrate that it has tidy organization? (3) answer: does the portfolio demonstrate cleaned data?. If any of these answers are NO or could be improved to make it easier for the general public to understand, provide specific guidance on how it could be improved.

In the first deliverable, location and education were built from the USDA data. She changed the column names to fit tidy data principles, converted any categorical data to factors, and data stored as doubles to integers too. In the 2nd deliverable, she stored the data she web scraped into two nibbles before joining the tables into 1 and then proceeded to follow the same procedure as the first set of data. The data that is presented in the entire project is clean and tidy.

Modeling

Here, describe (1) what predictive models have been built and what are their (dependent variable) predictors?; (2) answer: does the portfolio accurately describe the purpose of the models? (3) answer: does the portfolio accurately interpret the model's summary?

The model that was built used the number of graduates data from the first deliverable with the percentage change between 1999-200 and 2016-2017. She explains in both the 2nd and 3rd deliverable what the purpose of each model was for and what it is set out to answer.

Validation

Here, answer: (1) has a model been cross-validated using testing and training sets? (2) has the accuracy of the cross-validation been explained clearly and appropriately?

Both models that were created were cross-validated and the results were clearly explained. For example, she look at the overall p-value of the model and saw that it was .3614 which shoed that the number of graduates data was not a very good predictor in the change in salary. In the 2nd model, she explained that both models had very similar finding and both and a low overall p-value.

R Proficiency

Here, describe the strengths and weaknesses of how the R code has been developed; is it easy to read and understand? Have appropriate R techniques been used to make the code easy to maintain and reuse? Have appropriate functional programming techniques been used?

Olivia displayed strong R proficiency and newly presented the R code she sued along with an explanation of what she was doing with it. All variables are appropriately named and describe what they’re storing. She used sapply() a number of times to “grab” any data she needed instead of for loops. A weakness that can be found though is the messages that came from loading in libraries.

Communication

Has the portfolio been described in enough detail, but in wording that is easy for anyone to understand? Are visualizations used effectively to help communicate the data? What are its strengths and weaknesses?

She used thoroughly explained her process and what the variables and results meant too. The visualizations were easy to interpret and understand with her explanations of them; it helped that she changed the scaling of the x—axis. There were no weaknesses.

Critical Thinking

Does the operationalization and social impact demonstrate careful, critical thought about the future of the project? What are possible unintended consequences or variables that the author has not discussed?

Yes, she notes that a possible negative consequence of creating a government policy to increase teaching salaries would be on actually finding the money. She mentions that if the money came from taxes, there would be negative response from taxpayers. I feel like demographic should definitely be a variable that should be discussed as the numbers of different areas would be different.

OliviaAbbott commented 4 years ago

Data Preparation and Modeling (20/20)

I think that the data I used in this project is cleaned and organized well. The dataset that I used were cleaned to follow tidy data principles and to be easy to understand. I also think that the models I used in the project are appropriate given that data that I was able to find. I explained why I created the models and what I was trying to find out in each model.

Validation and Operationalization (18/20)

For the models that I created I used the k-fold cross validation method to better test my models. I also tested my models on the test data. Although I explained the p-value of each of my models and what that meant for each model, I did not explain the R2, MAE, and RMSE values even though I calculated then. I removed two points because I calculated these values and did not explain them, however I do think that my analysis on each model was still well explained and easy to understand.

R Proficiency (20/20)

I think that I did a good job explaining the R code that I was using, as well as using code that was easy to understand and maintain. I made sure to use functional programming (such as sapply) in my project to take advantage of that aspect of R. Overall I think that my code is easy to understand and done well.

Communication (19/20)

I think that I did a good job explaining my code and the purpose behind each step of my project. I made sure to explain my models and their purpose as well as the results from each model. I removed a point because I think that I could have communicated the results from my models a little bit better so that the reader would have more information to take away from each model. I still think the models are explained well, but looking back I realize that I could have added just a bit more.

Critical Thinking (20/20)

I tried to apply critical thinking to each step of this project. When explaining the purpose of each model I was trying to use critical thinking to justify why it made sense for me to create those models. I also used critical thinking in my analysis of the results of each model. I discussed some of the different possible consequences that could come from this project as well as what I would want to do next in the project.