google / automl

Google Brain AutoML
Apache License 2.0
6.25k stars 1.45k forks source link

Is there a semantic segmentation network or instance segmentation betwork based on Efficient-Det? #44

Closed bilel-bj closed 3 years ago

bilel-bj commented 4 years ago

You said that Efficient Det is performing well in semantic segmentation. We did not see how it works in semantic segmentation or instance segmentation? Is this intended to be delivered?

bluesky314 commented 4 years ago

+1 Very much interested in this

znoop333 commented 4 years ago

+1 I want to use this too. In the paper, "5.2. EfficientDet for Semantic Segmentation" describes a different network than what's implemented in this code. How do you "only use P2 for the final per-pixel classification"? I'd have to start digging through the model for the tensor with that. Where is detector_masks output in the python code? That looked promising, but I don't see where it is populated.

JVGD commented 4 years ago

I am also trying to build the EffcientDet for semantic segmentation. From what I could understand they just add level P2 to the multi-scale feature levels: {P2, P3, P4, P5, P6, P7}. They also refer to the Panoptic FPN paper where the feature fusion & upsample for segmentation is done as in the this picture from the paper.

image

However they say 2 things that got me confused:

I will see if I can try the 2 approaches:

  1. The feature upsample & fusion from panoptic FPN after the BiFPN output
  2. Using the classification head just on P2 + upsample for the segmentation
bonlime commented 4 years ago

@JVGD I was also wondering how to use EffDet for segmentation and found this issue. I think due to BiFPN layer additional feature fusion like in Panoptic FPN is not needed.

I'm also not sure if p6 and p7 layers are needed. EffDet is very close to RetinaNet where max_level for classification is set to 5.
https://github.com/tensorflow/tpu/blob/master/models/official/retinanet/retinanet_segmentation_model.py#L238 Also in RetinaNet min_level is set to 3 which is definitely not enough for detecting small objects.

I'm currently working on PyTorch implementation and would do the following:

Maybe I'll add additional fusion with P1. It may help to improve quality for small objects but would be a deviation from original paper.

JVGD commented 4 years ago

Very interesting @bonlime, thank you for the detailed explanation. It is very curious, in the end I developed an architecture very similar to what you propose. My approach was:

As you say, it did not make sense to use the Panoptic FPN feature fusion + upsampling because the BiFPN already address the issue of multi-scale fusion. I am using from from P2-P7 even though I only use P2 from BiFPN out because in BiFPN all levels are fused, so my thoughts are that the P2 output from BiFPN can benefit rich semantic feature maps from P6-P7 input in the fusion (since in the BiFPN all levels are fused).

Regarding to the issue with max_level=5 in RetinaNet, I think that although we use the classification branch from the RetinaNet prediction head, we use it with for a very different purpose, so using just P2 is ok. We do not want to use this block to classify regions (anchors) but pixels. Let's see if the assumptions were true once the training finishes.

fsx950223 commented 3 years ago

PYTHONPATH=./ python keras/segmentation.py