Open awsaf49 opened 1 year ago
Ccing @dwofk (the author of fast-depth
).
Thanks, @awsaf49 for reporting this. I believe this is because the NYU Depth V2 shipped from fast-depth
is already preprocessed.
If you think it might be better to have the NYU Depth V2 dataset from BTS here feel free to open a PR, I am happy to provide guidance :)
Good catch ! Ideally it would be nice to have the datasets in the raw form, this way users can choose whatever processing they want to apply
Ccing @dwofk (the author of
fast-depth
).Thanks, @awsaf49 for reporting this. I believe this is because the NYU Depth V2 shipped from
fast-depth
is already preprocessed.If you think it might be better to have the NYU Depth V2 dataset from BTS here feel free to open a PR, I am happy to provide guidance :)
@sayakpaul I would love to create a PR on this. As this will be my first PR here, some guidance would be helpful.
Need a bit of advice on the dataset, there are three publicly available datasets. Which one should I consider for PR?
uint16
hence more preciseuint8
hence less precisepgm
and dump
hence can't be used directly.cc: @lhoestq
I think BTS. Repositories like https://github.com/vinvino02/GLPDepth usually use BTS. Also, just for clarity, the PR will be to https://huggingface.co/datasets/sayakpaul/nyu_depth_v2. Once we have worked it out, we can update the following things:
Don't worry about it if it seems overwhelming. We will work it out together :)
@lhoestq what do you think?
@sayakpaul If I get this right I have to,
The last two are low-hanging fruits. Don't worry about them.
Yup opening a PR to use BTS on https://huggingface.co/datasets/sayakpaul/nyu_depth_v2 sounds good :) Thanks for the help !
Finally, I have found the origin of the discretized depth map. When I first loaded the datasets from HF I noticed it was 30GB but in DenseDepth data is only 4GB with dtype=uint8. This means data from fast-depth (before loading to HF) must have high precision. So when I tried to dig deeper by directly loading depth_map from h5py
, I found depth_map from h5py
came with float32
. But when the data is processed in HF with datasets.Image()
it was directly converted to uint8
from float32
hence the discretized depth map.
https://github.com/huggingface/datasets/blob/c78559cacbb0ca6e0bc8bfc313cc0359f8c23ead/src/datasets/features/image.py#L91-L93
Use Array2D
feature with float32
for depth_map
Code:
Features({'depth_map': Array2D(shape=(480, 640), dtype='float32')})
Pros: No precision loss.
Cons: As depth_map is saved as Array I think it can't be visuzlied in hf.co/dataset page like segmentation mask.
Use uint16
as dtype for Image in _h5_loader
for saving depth maps and accept uint16
dtype in datasets.Image()
feature.
Code
depth = np.array(h5f["depth"])
depth /= 10.0 # [0, max_depth] -> [0, 1]
depth *= (2**16 -1) # transform from [0, 1] -> [0, 2^16 - 1]
depth = depth.astype('uint16')
Pros:
Cons:
_h5_loader
in https://huggingface.co/datasets/sayakpaul/nyu_depth_v2 to convert depth_map from float32
to uint16
.datasets.Image()
converts np.ndarray
to uint16
checking max valuefloat32
to uint16
[0, 2^16 - 1]
to [0, max_depth]
before feeding them to model.Thanks so much for digging into this.
Since the second solution entails changes to core datatypes in datasets
, I think it's better to go with the first solution.
@lhoestq WDYT?
@sayakpaul Yes, Solution 1 requires minimal change and provides no precision loss. But I think support for uint16
image would be a great addition as many datasets come with uint16
image. For example UW-Madison GI Tract Image Segmentation dataset, here the image itself comes with uint16
dtype rather than mask. So, saving uint16
image with uint8
will result in precision loss.
Perhaps we can adapt solution 1 for this issue and Add support for uint16
image separately?
Using Array2D makes it not practical to use to train a model - in transformers
we expect an image type.
There is a pull request to support more precision than uint8 in Image() here: https://github.com/huggingface/datasets/pull/5365/files
we can probably merge it today and do a release right away
Fantastic, @lhoestq!
@awsaf49 then let's wait for the PR to get merged and then take the next steps?
Sure
The PR adds support for uint16 which is ok for BTS if I understand correctly, would it be ok for you ?
If the main issue with the current version of NYU we have on the Hub is related to the precision loss stemming from Image()
, I'd prefer if Image()
supported float32 as well.
I also prefer float32
as it offers more precision. But I'm not sure if we'll be able to visualize image with float32
precision.
We could have a separate loading for the float32 one using Array2D, but I feel like it's less convenient to use due to the amount of disk space and because it's not an Image() type. That's why I think uint16 is a better solution for users
A bit confused here, If https://github.com/huggingface/datasets/pull/5365 gets merged won't this issue will be resolved automatically?
Yes in theory :)
actually float32 also seems to work in this PR (it just doesn't work for multi-channel)
In that case, a new PR isn't necessary, right?
Yep. I just tested from the PR and it works:
>>> train_dataset = load_dataset("sayakpaul/nyu_depth_v2", split="train", streaming=True)
Downloading readme: 100%|██████████████████| 8.71k/8.71k [00:00<00:00, 3.60MB/s]
>>> next(iter(train_dataset))
{'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=640x480 at 0x1382ED7F0>,
'depth_map': <PIL.TiffImagePlugin.TiffImageFile image mode=F size=640x480 at 0x1382EDF28>}
>>> x = next(iter(train_dataset))
>>> np.asarray(x["depth_map"])
array([[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
[0. , 0. , 0. , ..., 0. , 0. ,
0. ],
...,
[0. , 2.2861192, 2.2861192, ..., 2.234162 , 2.234162 ,
0. ],
[0. , 2.2861192, 2.2861192, ..., 2.234162 , 2.234162 ,
0. ],
[0. , 2.2861192, 2.2861192, ..., 2.234162 , 2.234162 ,
0. ]], dtype=float32)
Great! the case is closed! This issue has been solved and I have to say, it was quite the thrill ride. I felt like Sherlock Holmes, solving a mystery and finding the bug🕵️♂️. But in all seriousness, it was a pleasure working on this issue and I'm glad we could get to the bottom of it.
On another note, should I consider closing the issue? I think we still need to make updates on https://github.com/huggingface/blog and https://github.com/huggingface/datasets/blob/main/docs/source/depth_estimation.mdx
Haha thanks Mr Holmes :p
maybe let's close this issue when we're done updating the blog post and the documentation
@awsaf49 thank you for your hard work!
I am a little unsure why the other links need to be updated, though. They all rely on datasets internally.
I think depth_map still shows discretized version. It would be nice to have corrected one.
Also, I think we need to make some changes in the code to visualize depth_map as it is float32
. plot.imshow()
supports either [0, 1] + float32 or [0. 255] + uint8
Oh yes! Do you want to start with the fixes? Please feel free to say no but I wanted to make sure your contributions are reflected properly in our doc and the blog :)
Yes I think that would be nice :)
I'll make the changes tomorrow. I hope it's okay...
Totally fine! No rush. Thanks again for your hard work!
Just a little update from me,
As from the new release, the image is with float32
it is taking more memory. The same code is throwing errors in Colab and Kaggle after consuming all ram/disk. In Kaggle it consumed nearly 80GB of disk and 30GB of ram. I may need to take it to AWS or GCP.
I can do a quick workaround by streaming=True
similar to @lhoestq. But then indexing won't be possible. Indexing is used in random image selection for visualization.
Also, I was hoping I could also add an overlay of image-depth as it would create a better perception. Additionally, use jet
colormap as most literature use it. Something like this,
Also, I was hoping I could also add an overlay of image-depth as it would create a better perception. Additionally, use jet colormap as most literature use it. Something like this,
Thanks for your suggestions! One of the main purposes of the guides is to keep the length minimal while maintaining completeness. So, I'd prefer to keep the code changes to a bare minimum. Does that make sense?
Once we have worked it out, we can update the following things:
* [Add a post for NYU Depth V2 in 🤗 Datasets blog#718](https://github.com/huggingface/blog/pull/718) * https://huggingface.co/docs/datasets/main/en/depth_estimation
Don't worry about it if it seems overwhelming. We will work it out together :)
@sayakpaul Regarding huggingface/blog/pull/718, it seems PR has not been merged yet.
@sayakpaul Regarding huggingface/blog/pull/718, it seems PR has not been merged yet.
That is correct, let's wait for that one.
Okay =)
Describe the bug
I think there is a discrepancy between depth map of
nyu_depth_v2
dataset here and actual depth map. Depth values somehow got discretized/clipped resulting in depth maps that are different from actual ones. Here is a side-by-side comparison,I tried to find the origin of this issue but sadly as I mentioned in tensorflow/datasets/issues/4674, the download link from
fast-depth
doesn't work anymore hence couldn't verify if the error originated there or during porting data from there to HF.Hi @sayakpaul, as you worked on huggingface/datasets/issues/5255, if you still have access to that data could you please share the data or perhaps checkout this issue?
Steps to reproduce the bug
This notebook from @sayakpaul could be used to generate depth maps and actual ground truths could be checked from this dataset from BTS repo.
Expected behavior
Expected depth maps should be smooth rather than discrete/clipped.
Environment info
datasets
version: 2.8.1.dev0