nubank / fklearn

fklearn: Functional Machine Learning
Apache License 2.0
1.5k stars 164 forks source link

split_evaluator_extractor not fully compliant with split_evaluator #151

Closed mlikoga closed 3 years ago

mlikoga commented 3 years ago

Problem description

split_evaluator has an optional parameter called eval_name, with a default value of None. If this parameter is used, split_evaluator_extractor, which is supposed to facilitate the extraction of information in the logs generated by split_evaluator, does not work. It only works for the default case.

Code sample

import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
from fklearn.validation.evaluators import split_evaluator, roc_auc_evaluator
from fklearn.metrics.pd_extractors import split_evaluator_extractor, evaluator_extractor

loaded_data = load_breast_cancer()
size = len(loaded_data['target'])
dataset = pd.DataFrame(loaded_data['data']).assign(
  target=loaded_data['target'], 
  prediction=np.random.rand(1, size)[0],
  split=np.random.randint(2, size=size)
)

eval_fn = roc_auc_evaluator(target_column='target', eval_name='roc_auc')
base_extractor = evaluator_extractor(evaluator_name='roc_auc')

# Default case
split_fn_default = split_evaluator(eval_fn=eval_fn, split_col="split")
default_logs = split_fn_default(dataset)
default_metrics = split_evaluator_extractor(default_logs, split_col="split", base_extractor=base_extractor, split_values=[0,1])
print(default_metrics)

#    roc_auc  split_evaluator__split
#0  0.488234                       0
#0  0.448012                       1

# Bug case - with eval_name
split_fn_named = split_evaluator(eval_fn=eval_fn, split_col="split", eval_name="named_eval")
named_logs = split_fn_named(dataset)
named_metrics = split_evaluator_extractor(named_logs, split_col="split", base_extractor=base_extractor, split_values=[0,1])
print(named_metrics)

#   roc_auc  split_evaluator__split
#0      NaN                       0
#0      NaN                       1

Expected behavior

split_evaluator_extractor should be able to extract logs generated by split_evaluator with an eval_name.

Possible solutions

Include a eval_name parameter on split_evaluator_extractor , just like temporal_split_evaluator_extractor has.

caique-lima commented 3 years ago

Can you add a code snippet to reproduce the problem?

mlikoga commented 3 years ago

Sure! Description updated with code