soton-data-mining / job-salary-prediction

A regression problem, predicting salaries of jobs in UK based on various criteria
8 stars 3 forks source link

removing highly correlated variables? #35

Closed bora-pajo closed 7 years ago

bora-pajo commented 7 years ago

I am wondering whether as a rule of thumb we should remove variables that are highly correlated to each other (say over .80) from any classification analysis. Is that correct or PCA takes care of it and we do not need to even check for correlation? Thank you in advance for your answers

utkuozbulak commented 7 years ago

It really depends, two features may correlate with rate .90 but remaining .10 might have a huge impact on the prediction. On the other hand, it might be garbage. The solution here isn't always throwing a dimension reduction algorithm ( such as PCA ) sometimes it is just flat out removing a feature. We try different approaches and see what is best for our model and move accordingly.

bora-pajo commented 7 years ago

Thank you @utkuozbulak !