nubank / fklearn

fklearn: Functional Machine Learning
Apache License 2.0
1.51k stars 165 forks source link

Add feature_clustering_selection method #209

Open brunoleme opened 2 years ago

brunoleme commented 2 years ago

Status

READY

Todo list

Background context

This is a correlation-based feature selection method. But unlike the already existing correlation_feature_selection which does not have a criteria to selected among correlated features, feature_clustering_selection first employs a feature clustering, using absolute correlation as distance metric, following by the selection of the feature with lower 1-R2 metric from each cluster. 1-R2 metric allows to find the feature that most preserve the information (own cluster R2) from the other features from the same clusters, penalizing by the information (nearest cluster R2) present in the nearest cluster.

Description of the changes proposed in the pull request

This commit will add the feature selection method feature_clustering_selection in fklearn/tuning/model_agnostic_fc.py

Where should the reviewer start?

The reviewer should start by method feature_clustering_selection at src/fklearn/tuning/model_agnostic_fc.py The method test_feature_clustering_selection at fklearn/tests/tuning/test_model_agnostic_fc.py illustrates how is the method usage.