Is there a semantic segmentation network or instance segmentation betwork based on Efficient-Det?

bilel-bj commented 4 years ago

You said that Efficient Det is performing well in semantic segmentation. We did not see how it works in semantic segmentation or instance segmentation? Is this intended to be delivered?

bluesky314 commented 4 years ago

+1 Very much interested in this

znoop333 commented 4 years ago

+1 I want to use this too. In the paper, "5.2. EfficientDet for Semantic Segmentation" describes a different network than what's implemented in this code. How do you "only use P2 for the final per-pixel classification"? I'd have to start digging through the model for the tensor with that. Where is detector_masks output in the python code? That looked promising, but I don't see where it is populated.

JVGD commented 4 years ago

I am also trying to build the EffcientDet for semantic segmentation. From what I could understand they just add level P2 to the multi-scale feature levels: {P2, P3, P4, P5, P6, P7}. They also refer to the Panoptic FPN paper where the feature fusion & upsample for segmentation is done as in the this picture from the paper.

However they say 2 things that got me confused:

we only use P2 for the final per-pixel classification (what about the feature upsampling and fusion referred before?)
We set the channel size to 128 for BiFPN and 256 for classification head. Both BiFPN and classification head are repeated by 3 times. (seems like if they were trying to do the segmentation with the prediction classificacion head only)

I will see if I can try the 2 approaches:

The feature upsample & fusion from panoptic FPN after the BiFPN output
Using the classification head just on P2 + upsample for the segmentation

bonlime commented 4 years ago

@JVGD I was also wondering how to use EffDet for segmentation and found this issue. I think due to BiFPN layer additional feature fusion like in Panoptic FPN is not needed.

I'm also not sure if p6 and p7 layers are needed. EffDet is very close to RetinaNet where max_level for classification is set to 5.
https://github.com/tensorflow/tpu/blob/master/models/official/retinanet/retinanet_segmentation_model.py#L238 Also in RetinaNet min_level is set to 3 which is definitely not enough for detecting small objects.

I'm currently working on PyTorch implementation and would do the following:

take P2, P3, P4, P5
pass through 3 layers of BiFPN
apply classification head to P2 only
upsample to match input resolution

Maybe I'll add additional fusion with P1. It may help to improve quality for small objects but would be a deviation from original paper.

JVGD commented 4 years ago

Very interesting @bonlime, thank you for the detailed explanation. It is very curious, in the end I developed an architecture very similar to what you propose. My approach was:

Modify BiFPN to accept one more level (from P2-P7)
Run the EfficientDet backbone + BiFPN
Take only level P2 from outputs of BiFPN
Run the P2 output through the classification branch of the retina head
Upsampling (x4) with a transposed convolution + standard conv (for smoothing upsampled map)

As you say, it did not make sense to use the Panoptic FPN feature fusion + upsampling because the BiFPN already address the issue of multi-scale fusion. I am using from from P2-P7 even though I only use P2 from BiFPN out because in BiFPN all levels are fused, so my thoughts are that the P2 output from BiFPN can benefit rich semantic feature maps from P6-P7 input in the fusion (since in the BiFPN all levels are fused).

Regarding to the issue with max_level=5 in RetinaNet, I think that although we use the classification branch from the RetinaNet prediction head, we use it with for a very different purpose, so using just P2 is ok. We do not want to use this block to classify regions (anchors) but pixels. Let's see if the assumptions were true once the training finishes.

fsx950223 commented 3 years ago

PYTHONPATH=./ python keras/segmentation.py

google / automl

Is there a semantic segmentation network or instance segmentation betwork based on Efficient-Det? #44