keras-team / keras-cv

Industry-strength Computer Vision workflows with Keras
Other
1k stars 332 forks source link

Run Length Encoding/Decoding utility #327

Closed innat closed 1 year ago

innat commented 2 years ago

Describe the feature

Run-Length-Encoding or RLE is one of the encoded annotation formats mostly for the segmentation task. It would be nice to have a utility function to encode and decode segmentation mask to and from RLE.

(To whom it may concern: What is RLE - Issue) (To whom it may concern: How it works)

How API will change?


import keras_cv.utils.rle_encode 
import keras_cv.utils.rle_decode 

Reference Implementation

Here are few

LukeWood commented 2 years ago

Great to know about this format @innat. Thanks for the reference! I will read up on this more in the coming weeks as we get closer to being ready to begin segmentation map support.

innat commented 2 years ago

HERE is a nice demonstration regarding the coco annotation format. It explains the RLE quite well. Placing it here for future reference.

sayakpaul commented 2 years ago

Here's the exact URL when RLE starts: https://youtu.be/h6s61a_pqfM?t=687

Thanks @innat!

ayulockin commented 2 years ago

Thanks for sharing this @innat.

To quickly summarise, the RLE format contains a start position and a run length. E.g. '1 3' implies starting at pixel 1 and running a total of 3 pixels (1,2,3).

Usually, RLE encoding has a string dtype with a space delimiter between the start position and run length. Thus RLE '1 3 10 5' implies pixels 1,2,3,10,11,12,13,14 are to be included in the mask. Also, note that usually, the encoding starts from the top-left corner of the image.

Something that I have seen in few Kaggle competitions - given an image with 5 classes to segment, there are 5 rles per class like [rle1, rle2, rle3,..]. There can be overlap that complicate things a bit but I think we should start with encoding for 2D masks.

sayakpaul commented 2 years ago

Yeah, 2D masks are preferable to start with.

But I think before supporting RLE the users would want to have support for other more industry-standard formats like images as seg maps, polygon coordinates, etc.

@LukeWood WDYT?

ayulockin commented 2 years ago

I might be wrong, but I think RLE is industry-standard. Basing this on multiple Kaggle competitions.

I would say RLE is ideal for storing the mask since it requires less memory. But for training quickly Seg masks are better suited. Thus format conversion utilities will come in handy. Nevertheless, we should support seg maps and polygons before RLEs.