VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
There is currently no way to generate the SQL to make a metric table.
Tasks:
[ ] machine_learning/metrics/classification.py: Create a way to get the underlying SQL of the metrics
[x] machine_learning/metrics/regression.py: Adding an additional parameter to regression_report to return the SQL of the metric instead of the result of the metrics.
Definition of Done:
SQL code generation is possible for regression and classification.
Concerns:
An example to show we really don't use sql to compute classification anymore:
how accuracy_score used to be computed in _metrics.py the 0.12.0 version of Verticapy
AVG(CASE WHEN {0} = {1} THEN 1 ELSE 0 END)
how accuracy_score is computed now in [classification.py]() in 1.0.0
Description:
There is currently no way to generate the SQL to make a metric table.
Tasks:
regression_report
to return the SQL of the metric instead of the result of the metrics.Definition of Done:
Concerns:
An example to show we really don't use sql to compute classification anymore:
accuracy_score
used to be computed in _metrics.py the 0.12.0 version of VerticapyAVG(CASE WHEN {0} = {1} THEN 1 ELSE 0 END)
accuracy_score
is computed now in [classification.py]() in 1.0.0def _accuracy_score(...): return (tp + tn) / (tp + tn + fn + fp)
def confusion_matrix(...) -> np.ndarray: res = _executeSQL( query=f""" SELECT CONFUSION_MATRIX(obs, response USING PARAMETERS num_classes = 2) OVER() FROM (SELECT DECODE({y_true}, '{pos_label}', 1, NULL, NULL, 0) AS obs, DECODE({y_score}, '{pos_label}', 1, NULL, NULL, 0) AS response FROM {input_relation}) VERTICAPY_SUBTABLE;""", title="Computing Confusion matrix.", method="fetchall", ) return np.round(np.array([x[1:-1] for x in res])).astype(int)
def _compute_final_score(...): cm = confusion_matrix(y_true, y_score, input_relation, **kwargs) return _compute_final_score_from_cm(metric, cm, average=average, multi=multi