Summary

The project investigated the amount of harmful particles being released into the air and the impact that population had on this. The project found that population was statistically significant, but that their were some limitations and other variables may be needed to determine why. In the future more data could be collected to help determine why population is a factor. This would be the next step.

Data Preparation

This project holds a data set that records the levels of different molecules in the air. This data set is later joined with another data set containing information about the population. Both of these data sets were tidy, clean, and explained thoroughly.

Modeling

The model that was created was trying to predict how much effect population has on harmful pollutants in the air. This was achieved by trying to predict the concentration of molecules given the year, week, population, and the levels of different molecules in the air. The portfolio does a good job interpreting the models summary and explaining the statistical significance.

Validation

The model was trained using and training set and then validated using another training set. The accuracy of the cross-validation was explained clearly in a simple manor and two examples were shown.

R Proficiency

Throughout the project all of the code was very easy to read, had descriptive variable names, and used the proper functions programming techniques. Additionally, the code would be very easy to maintain and reuse since proper programming practices were followed and things are laid out very clearly.

Communication

The portfolio has been laid out clearly and was simple to read. The visualizations are effective and help communicated the point of the data. They clearly show the trends in different levels over time. The visualizations in the first deliverable were very nice. I think the right amount of detail was given to make things easy to understand, but still quick and easy to read.

Critical Thinking

The author has discussed all variables thoroughly and thought of many of the potential consequences. Many of them are well thought out, including the additional variables that should be collected for further testing. Another interesting variable that might be worth looking into is more regional data. This could potentially show if certain regions worse for the environment than others and give insight into why.

Data Preparation and Modeling (18% out of 20%)

Rationale for score I believe that my data was neat and tidy, however my visualizations at the end of part 2 could be improved, because it is kind of difficult to understand.

Validation and Operationalization (20% out of 20%)

Rationale for score I feel I have discussed how this project could be operationalized thoroughly. Although my models did not predict as strong of a correlation between population and air pollution as I anticipated, I learned that there are many other variables that I didn't look into that could explain this. I communicated this, and suggested adding these variables to future studies.

R Proficiency (20% out of 20%)

Rationale for score I feel like my code was done cleanly and efficiently. I used a bunch of new methods and functions that I hadn't used in my previous R experience, and feel like I have a much better understanding of how to manipulate the data.

Communication (20% out of 20%)

Rationale for score I have communicated the results effectively so that anyone who doesn't understand R code or my topic can understand it. It flows smoothly from one transition to the next. I even sent my published pages to a friend of mine who is unfamiliar with this area of study, and she was able to understand it.

Critical Thinking (20% out of 20%)

Rationale for score Because my models did not show the conclusions that I thought they would, I had to think in depth about my topic, and figure out why I wasn't getting the data that I anticipated. I came up with several very plausible reasons why this could be, limitations to my model, and environmental factors that were not taken into account. I also discussed ways to improve this study in the future, and what the results of this study can potentially improve in society.

introdsci / DataScience-kcrisci94

Final Review #5