kunaldahiya / pyxclib

Tools for multi-label classification problems.
MIT License
126 stars 36 forks source link

Threshold based Evaluation Metrics #33

Closed NitinAggarwal1 closed 11 months ago

NitinAggarwal1 commented 1 year ago

While using fast_evaluate examples , we have a method to specify K and it will output the metrics.

This internally calls for Top K method to build out the metrics report .

import scipy.sparse as sp from xclib.utils.sparse import topk Y_pred = pred_labels Y_pred = Y_pred.tocsr() Y_pred.sort_indices() pad_indx = Y_pred.shape[1] print(pad_indx) indices_ , values_ = topk( Y_pred, 6, pad_indx, 0, return_values=True, use_cython=False) print(indices_[0]). # [2967 2970 2963 2976 2977 1866] print(values_[0]). # [0.8342234 0.56523454 0.20331156 0.19142145 0.15245992 0.13709748] In most cases , we will not know the number of labels , so we will have a threshold set based on relevance . In this case threshold at 0.50 will have 2967 , 2970. Do we have a way to say lets have a threshold and then calculate the metrics ?
kunaldahiya commented 1 year ago

Hi Nitin

The existing metrics are mostly computed at k. Do you have a specific metric in mind? You can also consider applying a threshold (t) as follows:

pred.data[pred.data < t] = 0
pred.eliminate_zeros()