uber-research / DeepPruner

DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch (ICCV 2019)
Other
354 stars 41 forks source link

question about the refer output of model #44

Closed Cai-RS closed 2 months ago

Cai-RS commented 2 months ago

Thanks for your great job! I wonder why in the kitti_submission.py you multiply 256 to the refer output disparity map? Isn't the direct output of model the final disparity estimation (matching the original size of image)? https://github.com/uber-research/DeepPruner/blob/40b188cf954577e21d5068db2be2bedc6b0e8781/deeppruner/submission_kitti.py#L124C1-L125C82

ShivamDuggal4 commented 2 months ago

Hi @Cai-RS Thanks for the warm words! I would recommend just running the network and checking the range of the output disparity. Multiplying by 256 is necessary to visualize any floating point 2D map (with value range 0...1) as a .png image

You can also know about the range of the disparity from the dataloader: https://github.com/uber-research/DeepPruner/blob/40b188cf954577e21d5068db2be2bedc6b0e8781/deeppruner/dataloader/kitti_loader.py#L73

I hope this answers your query, closing the issue! Best Regards

Cai-RS commented 2 months ago

Hi @Cai-RS Thanks for the warm words! I would recommend just running the network and checking the range of the output disparity. Multiplying by 256 is necessary to visualize any floating point 2D map (with value range 0...1) as a .png image

You can also know about the range of the disparity from the dataloader:

https://github.com/uber-research/DeepPruner/blob/40b188cf954577e21d5068db2be2bedc6b0e8781/deeppruner/dataloader/kitti_loader.py#L73

I hope this answers your query, closing the issue! Best Regards

Thanks for your reply! I found that KITTI stipulates that the range of the disparity of each left image pixel is (0, 256). The disparity map in KITTI is of uint16 type, which is equal to the real disparity * 256 after being converted to float type. Screenshot from 2024-07-31 10-42-07 So I think that multiplying the network's inference results by 256 is in order to submit it to the KITTI platform for testing? And in this case, the network's inference results should be float numbers in pixels, ranging from (0, max_disp), where max_disp is 192 by default in this work. https://github.com/uber-research/DeepPruner/blob/40b188cf954577e21d5068db2be2bedc6b0e8781/deeppruner/models/config.py#L29 Are you sure that the range of the network's inference results is (0, 1) (I'm sorry that I don't have a device to run the network model at the moment, I'm still learning the theory)?