v7labs / darwin-py

Library and commandline tool for managing datasets on darwin.v7labs.com
MIT License
115 stars 40 forks source link

Complex Polygons ignored in Instance Segmentation Dataset (Torch) #423

Closed monforte-dt closed 1 year ago

monforte-dt commented 1 year ago

The instance segmentation dataset contains code to handle complex polygons (as indicated by this line) , but the annotation type passed in super for this type of dataset is polygon causing the LocalDataset.parse_json to ignore complex polygons.

When parsing the json file, you filter the type of annotations using (line):

annotations = [a for a in annotations if a["name"] in self.classes and self.annotation_type in a]

Once self.annotation_type equals to "polygon", if annotation "a" has a complex_polygon field it will be skipped. One possible solution is:

annotations = [a for a in annotations if a["name"] in self.classes and any(self.annotation_type in k for k in a)]

Where polygon substring is contained in both polygon and complex_polygon, and should work for any kind of future derivative/complementing annotation types (e.g. box would cover box_xyxy and box_xywh) but would add an extra risk for naming annotation types with substrings of annotation keys (e.g. any annotation type which is a substring of "id" would cause every annotation for pass the filter).

martluide commented 1 year ago

I can confirm that this is a serious issue and I wasnt able to fix it with this code change either.

Edit: this fix (above) is incompatible with the recent change to get_dataset: https://github.com/v7labs/darwin-py/blob/519574265ac6313c4cb1199b8e70304f146f2402/darwin/torch/dataset.py#L313

Without this line it is possible to make it work.

I realize just now that I wasted 2 months of my developer time trying to fix AI when the dataloader doesn't even work properly. Ive had to fix other bugs in V7 code, but I had hoped that I was done with it, but apparently not! I am absolutely not happy right now. My dataset includes complex polygons only in situational cases in images with many objects , so I expected this to be an AI problem, but only now I realize the trainer never even saw these polygons!

It's absolutely awful that complex polygons are not passed to the dataloader and trainer by default. Complex polygons should be combined in the dataloader as a single mask so that we can use it as intended.

rafalzv7labs commented 1 year ago

Hi @monforte-dt & @martluide, thanks for raising the issue! Apologies that we haven't addressed it earlier. We've shared the details with our engineers, assigning it the high priority, and we're looking into it. We'll keep you both posted on any updates.

Nathanjp91 commented 1 year ago

@monforte-dt and @martluide can you both confirm what versions of the darwin-py library you're using? One of our developers has taken a look and can't seem to reproduce on the 0.8.7 release and it seems to have been resolved in a bug fix from late Decemeber. If you're on latest and still having issues can you let us know?

owencjones commented 1 year ago

Closing for now, as we've not heard, but if users find themselves affected by this, please add comment to this issue, and we will re-open the issue for investigation internally.

Many thanks

Owen