knageswara78 / Python_Solutions

0 stars 0 forks source link

Correlation of 2 variables. #1

Open knageswara78 opened 5 years ago

knageswara78 commented 5 years ago

Correlation among 2 variables.

knageswara78 commented 5 years ago

import numpy as np

threshold = 0.9

Create correlation matrix

corr_matrix = df.corr().abs()

Select upper triangle of correlation matrix

upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))

Find features with correlation greater than threshold

to_drop = [column for column in upper.columns if any(upper[column] > threshold)] print(to_drop) # These variables are correlated.

Drop features

df.drop(df.columns[to_drop], axis=1)