saulpw / aipl

Array-Inspired Pipeline Language
MIT License
119 stars 7 forks source link

multiple reductions on data #27

Open dovinmu opened 1 year ago

dovinmu commented 1 year ago

I'm working on a binary classification script and I want to compute multiple stats on it at the end to see how well the model(s) did. But if I compute eg precision as a 1.5=>0 transform, then I can't use the same data to compute recall. Here's how I'm getting around that now:

def recall(t:Table, predictions:str, true_values:str) -> float:
    N = true_values.shape[0]
    return (true_values == predictions).sum() / N

def precision(t:Table, predictions:str, true_values:str) -> float:
    TP = ((predictions == 1) & (true_values == 1)).sum()
    FP = ((predictions == 1) & (true_values == 0)).sum()
    return TP / (TP+FP)

@defop('compute-accuracy', 1.5, 0.5)
def compute(aipl, t:Table, predictions_colname, true_values_colname) -> dict:
    true_values = to_np_int_array(t, true_values_colname)
    predictions = to_np_int_array(t, predictions_colname)
    r = recall(t, predictions, true_values)
    print(r)
    p = precision(t, predictions, true_values)
    print(p)
    return {
        'recall': recall(t, predictions, true_values),
        'precision': precision(t, predictions, true_values)
    }

But it would be awesome to figure out a better and more general way of supporting these operations.