ifnspaml / SGDepth

[ECCV 2020] Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance
MIT License
200 stars 26 forks source link

Different Encoders for depth and segmentation. #12

Closed Ale0311 closed 3 years ago

Ale0311 commented 3 years ago

Hello,

I was wondering why did you choose to have the same encoder for depth and segmentation? Did you try with different ones as well? And how may I use different encoders for the two? Are there any restrictions concerning channels in the architecture? Or can I use entirely different encoders, like resnet and efficientnet, for example?

Thank you in andvance!

PS: I tried to make some changes, but I get this error:

Traceback (most recent call last):
  File "train.py", line 376, in <module>
    trainer.train()
  File "train.py", line 344, in train
    self._run_epoch()
  File "train.py", line 253, in _run_epoch
    outputs = model(batch)
  File "/home/diogene/anaconda3/envs/torch_110/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/models/sgdepth.py", line 314, in forward
    x_depth = self._multi_batch_unpack(dims, *x_depth)
  File "/home/diogene/Documents/Alexandra/SGDepth-master/models/sgdepth.py", line 251, in _multi_batch_unpack
    for x in xs
  File "/home/diogene/Documents/Alexandra/SGDepth-master/models/sgdepth.py", line 251, in <genexpr>
    for x in xs
AttributeError: 'tuple' object has no attribute 'split'

I tried removing this line self.split = layers.ScaledSplit(*grad_scales) from SGDepthCommon but then I get some dimension errors. I don't understand exactly what this split i supposed to do either.

Thank you again!

klingner commented 3 years ago

Hello,

I did choose the same encoder for depth and segmentation to be able to share the weights between both. This yielded improved results for both tasks in the experiments I conducted (cf. Tab. 3 of the paper).

Regarding the channels I think there should be no hard restrictions and you can in principle choose any backbone you like. However, if you change the (number of) scales at which features are passed to the decoder or if you change the number of channels, then the decoder might need to be adapted, too. I did in fact use the VGG-16 backbone for some experiments recently, which was not so hard to integrate beacause the number of scales at which features are produced is equal to the ResNet models.

The split attribute that you refer to takes care that the gradients are weighted on the backward pass to weigh the influence of both tasks on the shared network parts (cf. Eq. 6 of the paper).

Ale0311 commented 3 years ago

Hello,

Thank you for your response. I managed to changed the backbone successfully, but for the depth and segmentation simultaneously. What I really want to achieve now is to have 2 different encoders, one for the depth and one for the segmentation task. This is how I modified the SGDepth class:

        self.depth_encoder = SGDepthCommon(
            18, split_pos, (grad_scale_depth, grad_scale_seg),
            weights_init == 'pretrained'
            )

        self.seg_encoder = SGDepthCommon(
            34, split_pos, (grad_scale_depth, grad_scale_seg),
            weights_init == 'pretrained'
            )

        self.depth = SGDepthDepth( self.depth_encoder, resolutions_depth)
        self.seg = SGDepthSeg( self.seg_encoder)

        # The Pose network has it's own Encoder ("Feature Extractor") and Decoder
        self.pose = SGDepthPose(
            num_layers_pose,
            weights_init == 'pretrained'
        )

The error that I montioned above seems to appear in this function:

    def _multi_batch_unpack(self, dims, *xs):
        xs = tuple(
            tuple(x.split(dims))
            for x in xs
        )

        # xs, as of now, is indexed like this:
        # xs[ENCODER_LAYER][DATASET_IDX], the lines below swap
        # this around to xs[DATASET_IDX][ENCODER_LAYER], so that
        # xs[DATASET_IDX] can be fed into the decoders.
        xs = tuple(zip(*xs))

        return xs

I just don't understant why it says the encoder doesn't have the 'split' attribute, because in the SGDepthcommon class, which I use to create the encoders ( resnet 18 for depth and 34 for segmentation) there is this line:

self.split = layers.ScaledSplit(*grad_scales)

Maybe I don't need the split attribute anymore, because there is no shared encoder. Am I right?

Thank you in advance!

Ale0311 commented 3 years ago

So I managed to get it working by removing every reference to the split attribute and gradient scales in the SGDepthCommon class. I am curious about the results. Do you think this is the right approach?

Thank you!

klingner commented 3 years ago

I think it is totally fine to remove the split attribute, but I would then recommend to think about weighing the losses, in case there are still shared network parts. In case the networks are completely separate, I think the split attribute is not necessary at all and the losses also do not need to be weighted, so your approach seems reasonable to me.