mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
2.99k stars 400 forks source link

Compute metrics with selected threshold #422

Open pplonski opened 3 years ago

pplonski commented 3 years ago

Based on discussion in https://github.com/mljar/mljar-supervised/discussions/418 it will be helpful to compute metrics with the same threshold value.

neilmehta31 commented 2 years ago

Hi, is the issue still open? I would like to work on this feature. I'm new to the open-source world, so please help me with the info.

pplonski commented 2 years ago

Hi @neilmehta31, great that you would like to help with this issue. Yes, it is still open.

The problem is that currently there are reported many metrics in the model README.md but each has computed its own threshold value. Maybe we can add one more table with metrics with the same threshold for all metrics (I think we should threshold that is computed for eval_metric or from Accuracy).

@neilmehta31 please ask if you have any questions.

neilmehta31 commented 2 years ago

Thanks, @pplonski, for the reply. There are multiple algorithms used, and each has a README.md, so a table in each of the algorithms regarding the same threshold for all metrics is to be added, right? I will start the work right away.

pplonski commented 2 years ago

@neilmehta31 yes, in each README.md there should be additional table.

neilmehta31 commented 2 years ago

Hey @pplonski, I have made the necessary changes and added the table for the additional metric with threshold as the value computed for Accuracy. Would you please tell me if any other thing is to be added/edited in the table? I am attaching one for your reference. If everything seems fine, I will make a PR.

Metric details

score threshold
logloss 0.33148 nan
auc 0.903267 nan
f1 0.688375 0.35753
accuracy 0.846847 0.4854
precision 0.976471 0.974946
recall 1 3.88177e-06
mcc 0.58214 0.35753

Metric details with same threshold value (Accuracy threshold)

score threshold
logloss 0.33148 nan
auc 0.903267 nan
f1 0.67546 0.4854
accuracy 0.846847 0.4854
precision 0.689582 0.4854
recall 0.661905 0.4854
mcc 0.575497 0.4854

Confusion matrix (at threshold=0.4854)

Predicted as <=50K Predicted as >50K
Labeled as <=50K 4197 438
Labeled as >50K 497 973
pplonski commented 2 years ago

@neilmehta31 looks very good! :+1:

Maybe I will change the title 'Metric details with same threshold value (Accuracy threshold)' to:

@neilmehta31 please select the title version. I'm waiting for PR from you.

neilmehta31 commented 2 years ago

@pplonski Could you please tell me what do you meant by selecting the title version? Excited to contribute to the repo!!

pplonski commented 2 years ago

Which one will be better: 'Metric details with threshold from accuracy metric' or 'Metric details with threshold=0.4854' for the title over the table?

neilmehta31 commented 2 years ago

I think 'Metric details with threshold from accuracy metric' world be better as the threshold value can anyways be seen in the table. What do you say? I'll update the same before the PR