Closed ballen2024 closed 4 years ago
# Data Preparation and Modeling (*your score* out of 20%)
I am quite proud of my data cleaning and modeling. I have become an expert at cleaning, scraping, and transforming my data. My modeling made sense and gave strong explanation of them.
20/20
# Validation and Operationalization (*your score* out of 20%)
I have used cross validation by splitting up test and train data. Brendan has also agreed to this. My operationalization was quite good while Brendan added some thoughts of his own.
20/20
# R Proficiency (*your score* out of 20%)
20/20
# Communication (*your score* out of 20%)
My English and communication is not my strength. I'm continuing to work on it to better improve myself. I think I could have done this better.
19/20
# Critical Thinking (*your score* out of 20%)
Brendan did point out some of his own thoughts. I did want to work on the beauty dataset, but I was not able to connect my different datasets together.
19/20
*20/20*
Summary
This project investigated LinkedIn data and job listings in Australia to investigate how various factors play into an employees salary, employment duration, job type, etc. They concluded that their models would be of potential interest to HR departments to use when hiring new directors. Moving forward, they would like to gather more data on director salaries to get a more comprehensive look at what directors are paid across industries.
Data Preparation
He developed tables based on a person's LinkedIn data, job listing in Australia from a web scrape, as well as "beauty" data describing similar variables to the previous sets, but accounting for predicted beauty metrics. The tables are tidy, and he appropriately addressed issues from the first two deliverables in his final deliverable. One suggestion for tidying the data just a little bit more in its display would be to paginate the "linkedin2" dataset since it has a lot of variables and you have to side scroll the web page to view all of the variables.
Modeling
Here, describe (1) what predictive models have been built and what are their (dependent variable) predictors?; (2) answer: does the portfolio accurately describe the purpose of the models? (3) answer: does the portfolio accurately interpret the model's summary?
The models were built using a cross-validation method. He modeled location as a predictor of pay, average position length, previous tenure length and tenure length as a predictor of beauty metrics and position durations based on LinkedIn data like follower count, etc. I feel that the portfolio does accurately interpret the model's summary, as the strengths and weaknesses of the model are appropriately addressed.
Validation
The model was cross validated using testing and training sets. He used a 70/30 ratio. Your description of why your first model is not sufficient is good, and the explanation of the other models is good, especially with the inclusion of what the R^2 values mean in terms of your model.
R Proficiency
Here, describe the strengths and weaknesses of how the R code has been developed; is it easy to read and understand? Have appropriate R techniques been used to make the code easy to maintain and reuse? Have appropriate functional programming techniques been used? I ran into a problem of file naming being inconsistent, but it was an easy fix. When you load your previous deliverables, they were not named what you called in your .Rmd file. On this note too, since you load your first deliverable into your second, it would be sufficient to just load your second into your third, as opposed to loading both the first and the second into the third deliverable. This just saves time when running the R code. Good use of the trimws() function, I could have used that in my project. One suggestion I have for your graphs would to utilize the styling features of ggplot. For example, you could edit axis labels, titles, tilting the tick mark labels so that they don't overlap, etc. I think that would really clean up those visualizations. Utilizing suppressMessages() & suppressWarnings() would be helpful so that someone just reading your page doesn't have to scroll through the loading feedback or any other things that aren't necessary for someone just viewing your portfolio.
Communication
Your implementation of the search function in your tables is a good touch. I felt that I was easily able to understand what you were investigating and the models and conclusions were clearly described. The visualizations were good in assisting what you were investigating. Perhaps plotting some of your model variables that your found significant against each other would be a good way to visualize some trends.
Critical Thinking
His operationalization and social impact discussion addresses how this would be implemented well, and also considers unintended consequences of how HR reps may inappropriately adjust a directors pay based on the model indications. I think some deeper investigation into the beauty metrics would be really cool and would have some significant social impact. I think some unintended consequences that are partially addressed of the directors model could be a businesses tendency to conform their model to a new hire, or even people seeking employment to try and artificially get their follower counts up for the sake of being hired.