Summary

This project is investigating the financial benefits of working for the city, a city like San Fransisco. Benefits such as medical, dental, overtime, retirement and more. Along with a person’s total benefit based off their job. Overall it explained most of the predictions and it was fairly informative.

Data Preparation

Tables that were developed were Employee_Compensation, Employee_Earnings, Occupation, and Occupation_Association. Employee_Compensation holds the compensation an employee gets at the job such as medical, overtime and retirement. Employee_Earnings holds an individual’s finical information. Occupations holds the factors of a Job, including its Job Family and Union. Occupation_Association shows the hierarchy that a job falls under. Yes, the portfolio demonstrates tidy organization. Although some variables contained two different types input they were tidied and separated appropriately. The data was clean for the most part although there were some variables in the data that I didn't think were needed for that specific task you were trying to accomplish.

Modeling

A model has been build to see if we can predict the total benefit of each job. Along with that a job’s Total Benefits model from the other variables of a job’s pay was built. The total benefit of each job's model's predictors were the multiple R-squared value at: 0.997, the adjusted R-squared value at: 0.997, and the p-value at: < 0.00000000000000022. The models in the portfolio do explain the total benefit based off their job but does not fully explain the reason why a certain job/organization is more popular than others. To explain that you would need another model and perhaps more data. The model and the data is produced did interpret the model's summary pretty well.

Validation

Yes it has. There was a lot to test which made it a little confusing to grasp it all, I would suggest condensing it a little to make it more readable and understandable. Beside that the R2, MAE, RMSE seeded to support the narrative.

R Proficiency

Strengths of the R code were that is was well formulated and it flowed nicely within the Rmd files. However there was a lot of code that I didn't know what is was doing or why it was there. When it comes to writing a lot of code it is often good to explain some of it so the reader knows whats happening within the R code. The accessibility of the code is easy, however there is a lot of it that I'm unclear of. The right functional programming techniques have been used.

Communication

There was a lot of detail in this project. So much in fact that it was hard to comprehend it all under 30 minutes. The wording wasn't the easiest, I suggest wording it more clearer in some parts. The visualizations looked good, they described what the variables meant. Downside they were a little hard to put into complete perspective with regard to the portfolio because there was a lot going on.

Critical Thinking

It does show critical thought but in terms of the future you could say a little more about the impact of good benefits for future jobs. There are some social aspects that can be addressed such as if everyone gets a good job in the city what will that mean for jobs outside of major cities? Will those job get better or worse?

Data Preparation and Modeling (20% out of 20%)

Took a lot of time cleaning the data but also made sure that tables had the right information stored in them. Employee data, job data, and occupational data. Also organized the job listing to separate the ranges of salary and turned them into numerical columns.

Validation and Operationalization (17% out of 20%)

Could have explained my cross validation in more depth but I think my Operationalization was efficient based on the information I found on my data.

R Proficiency (19% of 20%)

At the beginning of the deliverables, I made some no-no’s with having for loops instead of sapply, but fixed those mistakes. I think I could have used other functions to do actions more efficiently, but I am still learning.

Communication (20% out of 20%)

I think I explained my project with clear details and even explained what most of my code did. I did explain a lot but if you took time, with no rush, I think it would make sense more.

Critical Thinking (18% out of 20%)

Could have been explained more about the social implications of my models and Operationalization, as my reviewer said.

introdsci / DataScience-maklh899

Final Review #2