roboflow / supervision

We write your reusable computer vision tools. 💜
https://supervision.roboflow.com
MIT License
24.21k stars 1.8k forks source link

`Detections.empty()` invalidates detections, causes crashes when `Detections.merge()` is called. #943

Open LinasKo opened 8 months ago

LinasKo commented 8 months ago

Search before asking

This is the underlying reason for #928. I believe it is important enough to deserve a separate issue.

Bug

Detectons.empty() creates a very specific type of Detections, which is guaranteed to be incompatible with some models when merging results.

Example

Suppose we run an instance segmentation model on a video. It finds one detection in frame 1, but no detections in frame 2. When we try calling Detections.merge on these detections, it raises an error.

IMAGE_PATH = "cat.png"
image = cv2.imread(IMAGE_PATH)
black_image = np.zeros_like(image, dtype=np.uint8)

model = get_roboflow_model(model_id="yolov8n-640")        # Error type 1
# model = get_roboflow_model(model_id="yolov8n-seg-640")  # Error type 2

result = model.infer(image)[0]
detections_1 = sv.Detections.from_inference(result)

result = model.infer(black_image)[0]
detections_2 = sv.Detections.from_inference(result)

sv.Detections.merge([detections_1, detections_2])
Error type 1 ValueError Traceback (most recent call last) in () 16 detections_2 = sv.Detections.from_inference(result) 17 ---> 18 sv.Detections.merge([detections_1, detections_2]) 1 frames /usr/local/lib/python3.10/dist-packages/supervision/detection/core.py in merge(cls, detections_list) 768 tracker_id = stack_or_none("tracker_id") 769 --> 770 data = merge_data([d.data for d in detections_list]) 771 772 return cls( /usr/local/lib/python3.10/dist-packages/supervision/detection/utils.py in merge_data(data_list) 629 all_keys_sets = [set(data.keys()) for data in data_list] 630 if not all(keys_set == all_keys_sets[0] for keys_set in all_keys_sets): --> 631 raise ValueError("All data dictionaries must have the same keys to merge.") 632 633 for data in data_list: ValueError: All data dictionaries must have the same keys to merge.
Error type 2 ValueError Traceback (most recent call last) [](https://localhost:8080/#) in () 18 19 ---> 20 sv.Detections.merge([detections_1, detections_2]) 1 frames [/usr/local/lib/python3.10/dist-packages/supervision/detection/core.py](https://localhost:8080/#) in stack_or_none(name) 756 return None 757 if any(d.__getattribute__(name) is None for d in detections_list): --> 758 raise ValueError(f"All or none of the '{name}' fields must be None") 759 return ( 760 np.vstack([d.__getattribute__(name) for d in detections_list]) ValueError: All or none of the 'mask' fields must be None

Deeper explanation

Detections contains these variables:

xyxy: np.ndarray
mask: Optional[np.ndarray] = None
confidence: Optional[np.ndarray] = None
class_id: Optional[np.ndarray] = None
tracker_id: Optional[np.ndarray] = None
data: Dict[str, Union[np.ndarray, List]] = field(default_factory=dict)

Suppose we call Detections.merge([detections_1, detections_2]). An error is rightfully raised when the same variable is defined differently in the detections. For example - when mask is None in detections_1 and np.array([]) in detections_2. It makes sense as we don't want to merge incompatible detections.

However, when calling Detections.empty(), frequently used to initialize the no-detections case, it sets a specific subset of the fields - only the xyxy, condifence and class_id. Many models use other fields as well. Because of this, the set of defined fields in an empty detection might be different to what the model returns. When trying to merge these, an error is raised.

Environment

No response

Minimal Reproducible Example

See aforementioned example.

this Colab contains examples, list of functions affected.

Are you willing to submit a PR?

I could help submit a PR when I figure out a solution, but this is a non-trivial bug and to my best estimates, solving it 'correctly' would require a redesign of Detections. Quick remediation may mean band-aid code and hope that future features are written defensively when merge is involved.

LinasKo commented 8 months ago

Affected code

The good news is - the error mostly appears when Detections.merge is called on results from unusual models - where confidence is missing, where data is set, where mask exists (and there's no detections at least once).

Affected functions:

See Colab for more details.

LinasKo commented 8 months ago

Mitigation Options

TL;DR: I think Option 3 is the least-bad solution. Adds complexity in exactly one location, keeps design and architecture intact, stops errors, prevents future worries. I also advocate for two eventual changes regardless of option taken (final section)

Here's how I see various mitigation options and their downsides. I can only speculate about the design goals and future plans of Roboflow, and I do not have access to the internal message board. So my goal now is to bring other devs up to speed and help think through the solutions.

My Assumptions

  1. Detections is meant to be universal and easily-understood, fitting both detection and segmentation models.
  2. Therefore, it must hold undefined variables, ready to take values from different model types.
  3. There's a need for Detections.empty, which represents empty detections of ANY type.

Caveats

  1. I have done little research on the data field, but found an example where it fails for the same reason.
  2. I may make generalizations for which models are used less frequently than others. This comes from observing their presence in the code, and how Roboflow prioritizes them on the site.

Option 1: Do nothing

The error has two stages:

  1. Calling Detections.empty from specific model loaders sets it up - this happens frequently.
  2. Calling Detections.merge triggers it - this happens rarely.

merge is:

Maybe it's not worth fixing? There haven't been many complaints.

The devs just need to be extra careful of using merge in anything new, largely relating to instance segmentation. Who knows - maybe there's not that many features left unimplemented.

Option 2: Forbid everything affected.

(The worst approach, in my opinion) Very radical - comment out merge and inference slicer, disallow smoothing. Who cares if there's incompatible objects around - they're not interacting with each other anyway. (I care).

Option 3: Change merge.

I like how merge works, given the assumptions. But maybe it's harmful, even if correct?

If (xyxy, confidence, class_id) merges with (xyxy, mask, class_id), maybe find a way to let it happen? Form (xyxy, mask, confidence, class_id) somehow.

(Brainstorming) When we need new values for:

Option 4: Explicitly define what's being initialized.

empty sets (xyxy, confidence, class_id). What is that? A detection model.

Let's make it initialize (xyxy, mask, class_id) for segmentation models. Some won't conform - SAM returns no class_id. So force it, make a mock id! This way you'd have two distinct model types.

Then, either pass an egument to Detections.empty to tell which one you're initializing, or define two different empty functions. For the outliers - full field set detections and those models with data - either set the fields explicitly, or add a third, more flexible Detections.empty. Same thing.

Did you spot an issue?

Suppose you have an intermediary function. A call chain of: Smoother -> some_func -> specific_empty.

Which type of empty should it call? Somehow, with an argument or context of some sort, some_func needs to know which type of variables to initialize / which empty to call. At present, if there's no detections such an intermediary would only receive an empty list - []. No type information!

That, or you need to forbid intermediaries from accepting [], return a specific empty() from the top-level caller instead.

Here's a thought experiment: try inlining Detections.empty. I mean, replace it with the Detections constructor and then explicitly set variables inside. In each case, can you tell which ones should be set? Try this inside Detections.merge to see the issue.

Option 4b: Frickin' Templates

We're passing the type information with function calls, elsewhere that would be done via templates or generics. Detections.empty<Det>(), Detections.empty<Seg>() might be completely normal in other languages.

Type parameter syntaxis only introduced in 3.12. Perhaps TypeVar, could help, but generally this would make the code less accessible to new devs.

Option 5: Multiple Classes

Did you notice that we're approaching a multi-class solution? My claim is that Detections can be at least two distinct objects - the detection and the segmentation result.

An option is to change the core design, split them in two, then define strict non-nullable members, adjust everything else to match that.

Detections becomes a base class passed to users and annotators, those have to frequently check contents. Notice that with nullable members similar checks might already be happening (I haven't checked).

Downside:

Option 6: Get rid of nullable types

This is my least thought-out option. Null values serve a purpose, representing invalid members - E.g. "This Detections result will never have confidence". This requires extra validation, disallows safely having a strong Detections.empty function such as the one we have now.

Perhaps Optional status can be revoked. There needs to be logic such that whether there's an empty mask or never any mask would look the same. Is there harm to that?

I'm think so. Null checks can be removed in places like __iter__(), but in a for loop, is there a way for users to easily tell which fields might suddenly produce values?

Essential Changes

I advocate for these two changes to be made eventually.

  1. Detections.empty is meant to construct empty detections for ANY model. Therefore it HAS to be weaker-defined than any model. The code now sets (xyxy, confidence, class_id). Does every model have at least all of these? No. Trim it down until it can be used as the constructor for Detections. Otherwise you'll be setting fields that normal model results may never have and constructing invalid objects. After this is done - can Detections constructor be used instead of Detections.empty()?
  2. Visually separate tracker_id from other members. Is is addressed differently, has a special set of methods ensuring its existence and is not produced by the models. A single empty line would give a hint to maintainers of its special status - miniscule thing, but it would help learn the code faster.
LinasKo commented 6 months ago

We've now implemented option 3 in https://github.com/roboflow/supervision/pull/1177

Ideally, I'd like to see an attempt at trimming down Detections.empty to just xyxy before closing this.

Would that cause issues? Would that be easy fine? Would that entail too much work to be worth it? I'd like to know that.