Open GeorgePearse opened 11 months ago
I'd like to help by submitting a PR
Hi @GeorgePearse and @RigvedRocks 👋🏻 ! Thanks for your interest in supervision
. I am sorry that I have not been responsive for the last few days. Before Christmas, I was busy with duties unrelated to supervision, and I was off for the last few days.
The idea looks interesting. @RigvedRocks could you share some initial ideas regarding implementation?
I was thinking of using the basic techniques used in ML such as using the roc curve or using Youden J's statistic but the above approach outlined by @GeorgePearse works for me. I guess I can collaborate with @GeorgePearse to work on this issue if he insists so.
I'd really like to do what I can to keep this ticking over, @SkalskiP do you also think it's valuable? I'm always surprised by the lack of open-source implementations for this, and assume that every company just has their own fix.
@RigvedRocks we could do something like I try to create a branch with a "workable" solution from fdet-api, but starting from the supervision format, and you take it from there? Let me know if that might interest you?
@josephofiowa also curious to hear your thoughts, I used to do it with some voxel51 code (they have a method from which you can get all of the matching predictions for a given IoU), but it was painfully slow.
I keep assuming a "good" solution must exist, but think that the emphasis on threshold agnostic metrics (map etc.) in academia means that it's not given much attention.
Hi @GeorgePearse 👋🏻 I like the idea and I'd love to look at your initial implementation. If possible, I want the solution:
Such a solution requires a lot of steps, so I need to understand how we can combine it with what we have and how to design the next elements to be as reusable as possible. We will also need to come up with a better name for this task and a better name for the feature. Haha
Yeah all makes sense, tbh, the reason I want it to be integrated into supervision is to solve those very problems, at the minute I'm dealing with a lot of opaque code, I only trust the outputs from having visually inspected the predictions from lots of model/threshold combos that have used it.
As for API questions.
Just something like
# Ideally target_metric could also be a callback so that you a user could customise exactly what they want
# to optimize for
per_class_thresholds: dict = optimize_thresholds(
predictions_dataset,
annotations_dataset,
target_metric='f1_score',
per_class=True,
minimum_iou=0.75,
)
And what is stored inside per_class_thresholds
? Dict[int, float]
- class id to optimal IoU mapping?
What's inside optimize_thresholds
? I'd appreciate any pseudocode.
class id to optimal score, the minimum IoU to classify a prediction and annotation as a match is set upfront by the user. Is that not the far more common use case for shipping ML products? The minimum IoU is defined by business/product requirements / can be done easily enough visually on a handful of examples. Maybe I'm biased by having mostly trained models where localisation is of secondary importance to classification, and a much much easier problem.
Complete pseudocode:
metrics = []
for class_name in class_list:
for threshold in range(0, 1, 100):
current_metric = calculate_metric(grid_of_matched_predictions_and_their_scores, metric='f1_score')
metrics.append({
'threshold': threshold,
'class_name': class_name,
'metric': metric,
})
metrics_df = pd.DataFrame(metrics)
# but so that you get a row per class, whatever the .groupby() kind of query
# would be to achieve that
best_metrics = metrics_df[metrics_df['metric'] == max(metrics_df['metric'])
But everything probably needs to calculated in numpy to not make it painfully slow.
There's a decent chance that this is where most people get this data from currently https://github.com/rafaelpadilla/Object-Detection-Metrics, but the repo is as you'd expect of something 5/6 years old, and doesn't have the useability/documentation of a modern open-core project.
This is what using the fdet-api looks like for me at the minute
thresholds = []
thresholds_dict = {}
f1_score_dict = {}
for counter, class_name in enumerate(annotation_class_names):
(
class_name,
fscore,
conf,
precision,
recall,
support,
) = cocoEval.getBestFBeta(
beta=1, iouThr=0.5, classIdx=counter, average="macro"
)
class_threshold_dict = {
"class_name": class_name,
"fscore": fscore,
"conf": conf,
"precision": precision,
"recall": recall,
"support": support,
}
f1_score_dict[class_name] = fscore
thresholds.append(class_threshold_dict)
thresholds_dict[class_name] = conf
thresholds_df = pd.DataFrame(thresholds)
print(thresholds_df)
So I end up with both the threshold to achieve the metric I care about, and the metrics that that threshold achieves
Understood. This sounds interesting to me. I'm worried about scope, especially if we want to reimplement all metrics.
@GeorgePearse Fine by me. You can create a new branch called and then I can refine your initial solution.
From a look through the metric functionality already implemented it looks like it wouldn't be too painful to add in. The object that comes out of this looks like it's already done most of the upfront maths needed.
Hard for me to tell just from a look, does the output structure contain the scores?
@GeorgePearse, could you be a bit more specific? What do you mean by scores?
Search before asking
Description
Create a simple API to find the best thresholds to maximise some metric (f1-score, precision, recall), given an annotated dataset and a model.
At the minute I use the below, because it's the only repo that I've found to calculate what I need, in a reasonable time frame.
https://github.com/yhsmiley/fdet-api
Use case
Anyone wanting to deploy models without manual thresholding (or viewing graphs).
Additional
No response
Are you willing to submit a PR?