vertica / VerticaPy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
https://www.vertica.com/python/
Apache License 2.0
218 stars 45 forks source link

[Pipeline] Underlying SQL Metrics #1099

Open zacandcheese opened 8 months ago

zacandcheese commented 8 months ago

Description:

There is currently no way to generate the SQL to make a metric table.

Tasks:

Definition of Done:

Concerns:

An example to show we really don't use sql to compute classification anymore:

def _accuracy_score(...): return (tp + tn) / (tp + tn + fn + fp)

def confusion_matrix(...) -> np.ndarray: res = _executeSQL( query=f""" SELECT CONFUSION_MATRIX(obs, response USING PARAMETERS num_classes = 2) OVER() FROM (SELECT DECODE({y_true}, '{pos_label}', 1, NULL, NULL, 0) AS obs, DECODE({y_score}, '{pos_label}', 1, NULL, NULL, 0) AS response FROM {input_relation}) VERTICAPY_SUBTABLE;""", title="Computing Confusion matrix.", method="fetchall", ) return np.round(np.array([x[1:-1] for x in res])).astype(int)

def _compute_final_score(...): cm = confusion_matrix(y_true, y_score, input_relation, **kwargs) return _compute_final_score_from_cm(metric, cm, average=average, multi=multi

oualib commented 6 months ago

@zacandcheese did you find any solution for this one?