How to visualize the influence of DC objetcts on the photometric loss?

ifnspaml / SGDepth

[ECCV 2020] Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

MIT License

200 stars 26 forks source link

How to visualize the influence of DC objetcts on the photometric loss? #33

Closed jsczzzk closed 2 years ago

jsczzzk commented 2 years ago

Hi: Great work! I want to visualize the influence of DC objetcts on the photometric loss like Monodepth2: However, it may need the pre-trained semantic segmentation model, is it right? Can you share it? Thank you!

klingner commented 2 years ago

I think the shown visualization is computed from the auto-masking mask and not from a segmentation network. Apart from that the semantic segmentation model is part of the pre-trained SGDepth model. You can also use it for segmentation inference.

I am not sure though if this was your question. Can you maybe rephrase in case the question is not answered now?

jsczzzk commented 2 years ago

Hi, thank you for your patient answer! Because monodepth2 can not handle with the problem of DC objetcts (persons in red boxes) , as shown below: The white pixels denote that the loss comes from either from the t+1 frame or t-1. In addition, the black pixels denote that the loss is excluded from the overall reprojection loss.

As a result, i want to use semantic segmentation mask to exclude the DC objetcts from the loss, i.e., persons in red boxes.

So, what should I do? Can you give me some hints. Thanks in advance!

klingner commented 2 years ago

Hi, so with the code I supplied here, the moving objects can be completely excluded from the loss by using the arguments

--masking-enable --moving-mask-percent 1.0

However, just excluding the objects completely from the loss leads to decreased performance from my experience. Does this help? I think I cannot give guidance on how to include the segmentation mask in the original monodepth2 Code as quite some changes are necessary. The code in this repository, may however give you some guidance how this can be achieved.

jsczzzk commented 2 years ago

Thank you for your patient answers!

jsczzzk commented 2 years ago

Hi, I'm sorry to trouble you again. I'm wondering if I'm doing this correctly. Because I have trouble with the training code.

I use the segmentation model to produce the dynamic mask for the -1, 0, and 1 frames. Then, i use the warping operation to get the final dynamic mask using (8). As shown below: occ2

Next, i use the final dynamic mask to exclude the loss of dynamic objects from the overall reprojection loss of Monodepth2.

In addition, according to your paper, this method cannot simultaneously detect both dynamic objects and non-moving dynamic objects in a image, i.e., the white parked car in the red box (see below for the close-up view). Is it right?

Finally, is it necessary to exclude the loss of distant dynamic objects?

I think these distant dynamic objects don't seem to move because they are too far from the camera.

Thanks in advance!

klingner commented 2 years ago

Hi again, about distant dynamic objects: I think they should be handled to some degree by the automasking technique of Godard et al. if they are behaving static, right? I think the automasking has exactly this aim to exclude statically appearing objects.

jsczzzk commented 2 years ago

Thank you so much again!

I have one more question about the dynamic object masking method proposed in your paper. It seems to mark whether the image is static or dynamic. For example, all cars in an image will be considered static or dynamic. If there are both moving and parked cars in an image, then there will be a problem. Is it right?

klingner commented 2 years ago

It is not necessarily a problem. Our method can be interpreted as some kind of data selection. We mask out the dynamic objects in the images where they violate the static world assumption the most. Therefore, the images where the objects are moving comparably more are still used for training. Does this make sense? Of course this does not work on an instance level in an image so in some cases there will be still slight violations of the static world assumption.

jsczzzk commented 2 years ago

Thank you so much again! I think dynamic object masking approach still has room for improvement.This will be the focus of my next research.