ifnspaml / SGDepth

[ECCV 2020] Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance
MIT License
200 stars 26 forks source link

Understanding the train procedures #4

Closed wwtu closed 3 years ago

wwtu commented 3 years ago

Thanks for your fantastic work, I am reading your paper and the codes. There is one thing I can't understand: according to your paper, it seems that the train process involves in both segmentation decoder and depth decoder simultaneously, but in the experimental setup section, it says that the semantic segmentation is trained on Cityscapes dataset and the depth estimation is trained on Kitti dataset, which indicates the trainings of segmentation and depth are separate, right? Otherwise, do we need use the segmentation mask of x_t in Fig.2, as ground truth to train the segmentation decoder? Would you please talk a little more on the training procedure?

In addition, if I need to train the depth on my own dataset, how should I prepare the dataset? (use my own dataset to train depth and citiscapes dataset to train segmenation) Thanks.

klingner commented 3 years ago

Hi @wwtu, so what is basically happening is that both datasets (KITTI for depth and Cityscapes for segmentation) are loaded simultaneously. Then batches from both datasets are sampled and concatenated. This concatenated batch is passed through the encoder, splitted into the depth-specific and segmentation-specific parts again and then passed through the respective decoders. Afterwards the losses are only calculated for the task-specific batches, so the depth loss is calculated for the KITTI images and the segmentation loss for the Cityscapes images (using the segmentation masks as groudn truth).

If you would like to train your own dataset for depth training using the code, then you would mainly have to define a new dataloader in loaders/depth/train.py. Note that the function returns a loader that when used supplies a dictionary containing key-data-pairs. You just need to take care that the correct keys are provided. You also would have the possibility to get familiar with our dataloader repository, which is linked in the README.

wwtu commented 3 years ago

Hi @klingner, thanks for your quick reply. I will try to understand your points by reading the corresponding codes. By the way, can you tell me how much time you spent for the training process (My PC has 1 GeForce GTX 1070 GPU)? Thanks again.

klingner commented 3 years ago

For me the training process usually takes about 40 hours on a GeForce GTX 1080 graphics card. However, the bottleneck there was actually the speed at which the images are loaded/preprocessed, so in case you happen to have more CPUs and a faster storage device, it should be even faster.

wwtu commented 3 years ago

@klingner OK, thanks a lot.

chetanmreddy commented 3 years ago

For me the training process usually takes about 40 hours on a GeForce GTX 1080 graphics card. However, the bottleneck there was actually the speed at which the images are loaded/preprocessed, so in case you happen to have more CPUs and a faster storage device, it should be even faster.

How can I make use of the more CPUs that I have. The code doesn't seem to automatically use all my CPUs. Is there any argument that I need to pass?

klingner commented 3 years ago

My understanding is that the num_workers argument passed to the Dataloader() class from pytorch sets the number of processes per Dataloader instance and that these would be automatically distributed across the CPUs. Please correct me, if I am wrong here, as I am no expert on that topic.