Final Report Peer Review

HR Analytics

The HR Analytics group was interested in discovering reasons for employees to be leaving a company and identity those employees that are the most likely to leave the company. They divided their dataset of employees into two groups “attritioners” (people who left the company within the year), and non-attritioners. The data used was actually compiled from three different datasets containing employee information. The datasets were compiled using the employee ID. I really liked that this group spent time and effort identifying the least important features and removing them in order to reduce overfitting of their models. For some removed variables there was the same response for each employee -thus the variable was rendered useless. For other removed variables, however, the group strategically removed those variables with very strong correlation. I didn’t particularly like the group’s handling of the missing data. I think that it was too bold of a decision to replace all missing values in the “environment satisfaction,” “job satisfaction” and “work life balance” with the minimum of that column. I similarly thought it was too bold to replace all missing values in “number of companies worked” and “total working years” by those columns’ median values. One can easily notice in the histograms of these two variables that there are two distinct bins which must represent the medians because of how tall these bins are. I bet that imputing these missing values in such a way had a negative impact on all subsequent models. I think that attempting matrix completion would have been a much safer bet. I was very impressed by the exploratory analysis of this group. I thought it was cool that the group showed the age distribution for attritioners and non-attritioners and discussed the intuition of younger people being more likely to move around. Throughout the exploratory analysis and modeling I kept noticing that the most important predictors were what intuitively made sense (although I wish the group had emphasized this a bit more). For example, years since last promotion was very important in determining if someone was likely to leave. Similarly, I liked that this group repeatedly (through various models) demonstrated the important features. I particularly liked the odds ratio from the logistic regression and the feature importance graph from the random forest. Something that concerned me here, though, is that the top feature from the logistic regression (by a factor of 2) was number 8 from random forest. An explanation of these differences would be useful. In general, I wish that the group wrote a little bit more in their conclusion regarding what to do with the information they gained from models. It is pretty obvious that higher wages and more promotions would lead to less employees leaving -however, these goals aren’t really feasible for a company with limited cash and limited management positions. I would’ve liked if the group identified some variables that would be easier for a company to manipulate. Finally, I think the report could’ve used a couple more proof reads. I found a few grammar mistakes.

silviaruiz44 / HRAnalytics

Final Report Peer Review #9