silviaruiz44 / HRAnalytics

0 stars 0 forks source link

Midterm peer review zl683 #6

Open Kitty763 opened 4 years ago

Kitty763 commented 4 years ago

This project aims to predict whether an employee is likely to leave the company using data from three different sources: an employee survey, and employee database and a direct manager survey. The data are from three different departments within the company.

Three things I like about the report:

  1. The topic itself is very realistic, and I can imagine that companies would find this model very useful because recruting cost is very high. report is very well written and detailed. The structure is clear, the layout allows more information, and there are many figures and charts for illustrating the points.
  2. Logistic regression is a suitable model for this project. The whole preliminary analysis process is quite clear and standard: exploratory data analysis, model fitting, train-test split and confusion matrix.
  3. Exploratory data analysis is very detailed. Data types are listed and for each feature there are statistics about it. Also this is used to determined how big a role certain feature is going to play in the prediction.

Three areas for improvement:

  1. A correlation plot could be added to further explore the relations between different features. After looking into each feature one by one, it is necessary to see how they interact with each other. For example if you see too features are highly correlated you may consider drop one of them.
  2. Exploring more models. Since in this stage you are applying preliminary data analysis, you can try more models and pick one with good performance or higher interpretability. For example if we picture using this model in a real world scenario, explaining logistic regression to non-technicians may take some time, and a decision tree could be more easily understood.
  3. Considering applying this model in real life, a false positive and a false negative could mean complete differet moves a company should take, and the cost are different. It’s good that you emphasized this. However the confusion matrix looked a little bit confusing. From the table I cannot tell whether the columns are predictions or the rows are.
  4. Inspired by one of our homeworks, I think comparing department models and a company model would be interesting because models for different departments might be very different.