Custom Data Inference - Githubissues

mli0603 / stereo-transformer

Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers. (ICCV 2021 Oral)

Apache License 2.0

666 stars 106 forks source link

Custom Data Inference #28

Open aakash26 opened 3 years ago

aakash26 commented 3 years ago

Hi authors, Thanks for providing the code for training and inference. I wanted to run the STTR on custom frames to see the depth output required. As seen in training data, we need to provide along with stereo images the occlusion map and initial disparity. Do you have any suggestions or algorithms that we can use to get the occlusion information from stereo images to run STTR on custom data?

Thanks and Regards Aakash Rajpal

mli0603 commented 3 years ago

Hi @aakash26, thanks for reaching out. As mentioned in Q&A item 3, there are two types of occlusion, 1) at left image border, 2) at left border of objects.

If you have disparities from both images (such as Scene Flow or Middlebury), you can easily identify these regions using script here.
If you only have one disparity, you can easily identify the 1) left border occlusion easily be checking if corresponding pixels will be negative in the right image, which is already implemented in the same script. For 2) left border occlusion, I think you are out of luck. This is the problem with KITTI dataset as well, where left object borders are not identified at all. If you trust the output from STTR, I guess you can run STTR on your data first and get occlusion information. Then assume this is correct and exclude them from training. But I understand this is far from ideal...

I hope this helps!

aakash26 commented 3 years ago

Hi @mli0603, thanks for providing the information and explanation. I have both disparities (depth map), however now when trying to run inference on custom videos I am not sure where and when to provide occlusion mask as the inference example given in collab or normal is based on KITTI dataset which consists of disparity+occlusion combined in one image file. But for Custom videos how do I run the inference, as with the provided inference script the output disparities are poor and unusable. Could you please guide me for inference only? Also, for custom videos, which pretrained weights would be better to use?

mli0603 commented 3 years ago

Hi @aakash26,

Is your dataset similar to KITTI?

If so, you can use the KITTI pretrained weights.
If not, you should use the Scene Flow pretrained weights. This one provides better generalization.

aakash26 commented 3 years ago

Hi @mli0603 ,

Thanks for the reply. My dataset is not similar to KITTI but more similar to the SCENE_FLOW dataset as you provided in sample_data and hence I am using Scene Flow pretrained weights. However, my question is for this dataset can you provide an inference script or some idea on how to use the STTR model. The only provided inference script is for KITTI dataset which is different with it's disparity and occlusion images. Thanks again for sharing the code.

Regards Aakash

Rashfu commented 2 years ago

Hi, @mli0603 . I'm curious about why the data format of KITTI and SCARED is different, because both are provided with stereo GT depth. In your code, KITTI ones is provided with disp_occ, SCARED ones is provided with occ_mask where 128 represents the occlusion. Could you explain the reason and give me some advice on how to process SCARED dataset to get the format as yours?