Description:

There is currently no way to generate the SQL to make a metric table.

Tasks:

[ ] machine_learning/metrics/classification.py: Create a way to get the underlying SQL of the metrics
[x] machine_learning/metrics/regression.py: Adding an additional parameter to regression_report to return the SQL of the metric instead of the result of the metrics.

Definition of Done:

SQL code generation is possible for regression and classification.

Concerns:

An example to show we really don't use sql to compute classification anymore:

how accuracy_score used to be computed in _metrics.py the 0.12.0 version of Verticapy AVG(CASE WHEN {0} = {1} THEN 1 ELSE 0 END)

how accuracy_score is computed now in [classification.py]() in 1.0.0


def accuracy_score(...):
return _compute_final_score(
    _accuracy_score,
    **locals(),
)

def _accuracy_score(...): return (tp + tn) / (tp + tn + fn + fp)

def confusion_matrix(...) -> np.ndarray: res = _executeSQL( query=f""" SELECT CONFUSION_MATRIX(obs, response USING PARAMETERS num_classes = 2) OVER() FROM (SELECT DECODE({y_true}, '{pos_label}', 1, NULL, NULL, 0) AS obs, DECODE({y_score}, '{pos_label}', 1, NULL, NULL, 0) AS response FROM {input_relation}) VERTICAPY_SUBTABLE;""", title="Computing Confusion matrix.", method="fetchall", ) return np.round(np.array([x[1:-1] for x in res])).astype(int)

def _compute_final_score(...): cm = confusion_matrix(y_true, y_score, input_relation, **kwargs) return _compute_final_score_from_cm(metric, cm, average=average, multi=multi

vertica / VerticaPy

[Pipeline] Underlying SQL Metrics #1099

Description:

Tasks:

Definition of Done:

Concerns: