lhoyer / improving_segmentation_with_selfsupervised_depth

[CVPR21] Implementation of our work "Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation"
244 stars 30 forks source link

how to do an inference on my own images? #5

Closed fjremnav closed 3 years ago

fjremnav commented 3 years ago

Not able to find an instruction of how to do this, Please help

Thanks,

lhoyer commented 3 years ago

So far this repository only supports training and evaluation on Cityscapes, CamVid, and Mapillary. To train and evaluate with your own dataset, please have a look at Issue #1.

fjremnav commented 3 years ago

I just want to use your cityscapes-pretrained model for inferencing my test images because they are similar. No plan to do training

Thanks,

lhoyer commented 3 years ago

I have added this functionality in 1a6f4a8. Please, have a look at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth#inference-with-a-pretrained-model for instructions.

fjremnav commented 3 years ago

It works. How to change image resolution? My test image resolution is 1920x1080 and I want to have 1920x1080 ouput. Current setting is 1024x512 output

Thanks,

lhoyer commented 3 years ago

You can basically add the following lines here:

cfg['monodepth_options']['height'] = 1080
cfg['monodepth_options']['width'] = 1920
fjremnav commented 3 years ago

@lhoyer

I follow your suggestion by adding those 2 lines in inference.py and run it again, but got the following error:

Start inference2021-07-06_01-07-15-394479 RUNDIR: results//inference2021-07-06_01-07-15-394479/ Found 1200 val images Load mono_cityscapes_1024x512_r101dil_aspp_dec6_lr5_fd2_crop512x512bs4weights Load mono_cityscapes_1024x512_r101dil_aspp_dec6_lr5_fd2_crop512x512bs4depth weights Load mono_cityscapes_1024x512_r101dil_aspp_dec6_lr5_fd2_crop512x512bs4depth weights Validate inference2021-07-06_01-07-15-394479 0%| | 0/600 [00:00<?, ?it/s]PAD run first half of decoder ([4, 3, 2]). bottleneck shape torch.Size([2, 2048, 68, 120]) upconv4-0 shape: torch.Size([2, 256, 68, 120]) concatenated features shape: torch.Size([2, 1280, 68, 120]) upconv4-1 shape: torch.Size([2, 256, 68, 120]) upconv3-0 shape: torch.Size([2, 256, 68, 120]) 0%| | 0/600 [00:01<?, ?it/s] Traceback (most recent call last): File "inference.py", line 174, in inference_main(cfg) File "inference.py", line 138, in inference_main inference.run() File "inference.py", line 96, in run outputs = self.model(inputs_val) File "/home/henry.jeng/anaconda3/envs/remnav/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, kwargs) File "/home/user/4TB/fc_tmp/git/improving_segmentation_with_selfsupervised_depth/models/joint_segmentation_depth.py", line 84, in forward outputs.update(self.models"mtl_decoder") File "/home/henry.jeng/anaconda3/envs/remnav/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/home/user/4TB/fc_tmp/git/improving_segmentation_with_selfsupervised_depth/models/joint_segmentation_depth_decoder.py", line 146, in forward depth_features = self.depth_dec(encoder_features, exec_layer=first_exec_layers) File "/home/henry.jeng/anaconda3/envs/remnav/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/home/user/4TB/fc_tmp/git/improving_segmentation_with_selfsupervised_depth/models/depth_decoder.py", line 100, in forward x = torch.cat(x, 1) RuntimeError: Sizes of tensors must match except in dimension 2. Got 135 and 136 (The offending index is 0)

lhoyer commented 3 years ago

The problem is that the network architecture only supports resolutions that can be divided by 2 for 5 times without a remainder. Possible solutions would be rescaling the images to an appropriate resolution, e.g. 2048 x 1152, or cropping the images to fulfill that requirement.