isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.52k stars 624 forks source link

2.1 Broke my ability to run on Ubuntu 18.x #65

Open gateway opened 3 years ago

gateway commented 3 years ago

So I just updated today on my Ubuntu 18.x box which seems to have broken something for me since I can no longer run.. I just did a test right before doing a pull.. thoughts?

(base) gateway@gateway-media:~/work/depth/MiDaS$ python run.py 
initialize
device: cuda
Loading weights:  model-f6b98070.pt
Downloading: "https://github.com/facebookresearch/WSL-Images/archive/master.zip" to /home/gateway/.cache/torch/hub/master.zip
Traceback (most recent call last):
  File "run.py", line 151, in <module>
    run(args.input_path, args.output_path, args.model_weights, args.model_type, args.optimize)
  File "run.py", line 32, in run
    model = MidasNet(model_path, non_negative=True)
  File "/home/gateway/work/depth/MiDaS/midas/midas_net.py", line 47, in __init__
    self.load(path)
  File "/home/gateway/work/depth/MiDaS/midas/base_model.py", line 11, in load
    parameters = torch.load(path, map_location=torch.device('cpu'))
  File "/home/gateway/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 527, in load
    with _open_zipfile_reader(f) as opened_zipfile:
  File "/home/gateway/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 224, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1579040055865/work/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /opt/conda/conda-bld/pytorch_1579040055865/work/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f908fa0a627 in /home/gateway/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7f909974de2b in /home/gateway/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7f909974f044 in /home/gateway/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x6d2146 (0x7f90c65eb146 in /home/gateway/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x28ba06 (0x7f90c61a4a06 in /home/gateway/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #40: __libc_start_main + 0xe7 (0x7f90e4490bf7 in /lib/x86_64-linux-gnu/libc.so.6)
ranftlr commented 3 years ago

Seems to be similar to #56. You might be running an older version of PyTorch. Can you upgrade to PyTorch 1.7 and try again?

gateway commented 3 years ago

well, crap.. ok I created a virtual env with conda with python 3.7 and installed conda install opencv pytorch torchvision -c pytorch now it seems to be working again.. .. thanks..

conda create --name midas python=3.7 conda activate midas conda install opencv pytorch torchvision -c pytorch

btw anyone tried this on equirectangular images yet? I do a lot of 360 photography and wondering if this would help out with depth map creation.

ranftlr commented 3 years ago

Great!

With respect to equirectangular images: we haven't tried, but would be curious to hear what you find out. The extreme distortions encountered in spherical cameras are not present in the training datasets, so I don't really have high expectations here. However, MiDaS has surprised us before, for example, that it works surprisingly well when applied to cartoons and paintings.

gateway commented 3 years ago

Here is an example of a 360 pano then the depth created.. Kinda. complex example..

Seems like the sharp edges of things get a bit fuzzy and not as sharp as I would think. Any additional params I can try to play around with this and the last question would be would it be possible to train some sort of inside 360 panos from tours.. How many images would we need?

tre1 tre1_depth tre-overlay

ranftlr commented 3 years ago

Thanks, works better than I would have expected. With respect to parameters, there really is not much you can play with. The only thing is the resolution at which prediction happens (the resize operation in the transform, currently at 384x384). Increasing this might give sharper results, but our experience is also that while results might look better, they are overall less accurate. You could also try to apply MiDaS to the raw images and then stitch it it together, same as you do with the RGB images.

The required number of images is really hard to tell in advance as it depends on your accuracy requirements, the diversity of data, as well as quality of the ground truth. Obviously the more images the better. Maybe the ReDWeb dataset can act as guidance: it contains about 3500 images with very diverse content and reasonably good ground truth. This already gets you quite far.

gateway commented 3 years ago

Thanks, works better than I would have expected. With respect to parameters, there really is not much you can play with. The only thing is the resolution at which prediction happens (the resize operation in the transform, currently at 384x384). Increasing this might give sharper results, but our experience is also that while results might look better, they are overall less accurate. You could also try to apply MiDaS to the raw images and then stitch it it together, same as you do with the RGB images.

What do you mean to the raw images? They are fisheye and would probably not work as well?

The required number of images is really hard to tell in advance as it depends on your accuracy requirements, the diversity of data, as well as quality of the ground truth. Obviously the more images the better. Maybe the ReDWeb dataset can act as guidance: it contains about 3500 images with very diverse content and reasonably good ground truth. This already gets you quite far.

Hmm, how can I train this data set, I downloaded it and it has a lot of images.. I have a few Titan RTX's.. We in the 360 virtual tour community are looking for solutions to help us get better depth.. again thank you for taking the time to reply! Cheers