lorenmt / mtan

The implementation of "End-to-End Multi-Task Learning with Attention" [CVPR 2019].
https://shikun.io/projects/multi-task-attention-network
MIT License
665 stars 108 forks source link

Cityscapes depth #25

Closed liyangliu closed 4 years ago

liyangliu commented 4 years ago

Hi, @lorenmt. I downloaded your processed Cityscapes dataset and found that the values in those numpy arrays are >= 0 (probably most of them are <0.5). And when I load the official Cityscapes disparity, the values are also >=0, but are much larger (maybe ~30000). Would you tell me how do you pre-process the original disparity data to get those numpy arrays? Thanks in advance.

lorenmt commented 4 years ago

Hi,

The pre-processed CityScape depth is inverse depth, so it would be easier to represent infinite-depth such as sky.

In the original paper, we directly use this inverse depth as ground-truth depth, and did not apply additional data processing.

Hope this helps.

NareshGuru77 commented 3 years ago

Hi,

Could you maybe elaborate on how you get the gt inverse depth value ? you divide all the inverse depth values by a maximum number ? What is this number ?

lorenmt commented 3 years ago

The inverse depth actually is the original cityscapes disparity data. As claimed in the previous comment, we did not apply any further processing.

NareshGuru77 commented 3 years ago

Thank you for your reply.

The original cityscapes disparity data has values ranging from 0 to 32257. The inverse depth values provided in your dropbox ranges from 0 to 0.4922.

This means that some processing has been done to bring original disparity values to the range 0 to 1 right ? Could you maybe provide this details.?

lorenmt commented 3 years ago

That is because you probably were using Image.open to decode the image file, which has a different data type. Try to use plt.imread, then you should get identical values.

NareshGuru77 commented 3 years ago

thank you.. I see that plt.imread I am able to get the same values..

"binary_mask = (torch.sum(x_output, dim=1) != 0).unsqueeze(1).to(device)"

In this depth error binary mask, could you let me know whether the goal is to avoid non zero pixels in every ground truth depth map ?

Thank you.

lorenmt commented 3 years ago

Since those ground-truth depths were recorded with a real-life measuring sensor, they were typically not perfect. If you visualise the depth, you can easily observe that those 0-values are invalid -- meaning no valid depth values were recorded.

Hope that helps.

NareshGuru77 commented 3 years ago

Thank you.. That was very helpful.

ganyz commented 3 years ago

Thank you for your reply.

The original cityscapes disparity data has values ranging from 0 to 32257. The inverse depth values provided in your dropbox ranges from 0 to 0.4922.

This means that some processing has been done to bring original disparity values to the range 0 to 1 right ? Could you maybe provide this details.?

I have the same confusion. Have you solve the problem ?

NareshGuru77 commented 3 years ago

That is because you probably were using Image.open to decode the image file, which has a different data type. Try to use plt.imread, then you should get identical values.

As @lorenmt suggested, after using plt.imread I was getting the same range as in the data provided in dropbox..

ganyz commented 3 years ago

Thank you ! When i use plt.imread, it's the same as the provided data.

But I have another question.The official cityscapes git page https://github.com/mcordts/cityscapesScripts says

"disparity precomputed disparity depth maps. To obtain the disparity values, compute for each pixel p with p > 0: d = ( float(p) - 1. ) / 256., while a value p = 0 is an invalid measurement. Warning: the images are stored as 16-bit pngs, which is non-standard and not supported by all libraries."

The data's max value is 0.4922, meaning the disparity values are all <0. Because (0.4922-1)/256<0. The disparity values ought to be <0 ? It confuse me much. >.<

ganyz commented 3 years ago

Thank you for your reply.

The original cityscapes disparity data has values ranging from 0 to 32257. The inverse depth values provided in your dropbox ranges from 0 to 0.4922.

This means that some processing has been done to bring original disparity values to the range 0 to 1 right ? Could you maybe provide this details.?

Maybe I know.. in plt.imread, if the imge is .png , the function will return the float value [0,1], which is calculated from

the real value/ the bit depth(65535) .

The plt.imread doc says: "PNG images are returned as float arrays (0-1). All other formats are returned as int arrays, with a bit depth determined by the file's contents." look here

0 to 32257 /65535 ->0 to 0.4922

lorenmt commented 3 years ago

I would highly suggest check out this post: https://github.com/mcordts/cityscapesScripts/issues/55#issuecomment-411486510 on computing the real depth.

Using pltimead here is more like an approximation, you need focal length to fully convert it into the real depth.