huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.24k stars 2.69k forks source link

Discrepancy in `nyu_depth_v2` dataset #5461

Open awsaf49 opened 1 year ago

awsaf49 commented 1 year ago

Describe the bug

I think there is a discrepancy between depth map of nyu_depth_v2 dataset here and actual depth map. Depth values somehow got discretized/clipped resulting in depth maps that are different from actual ones. Here is a side-by-side comparison,

image

I tried to find the origin of this issue but sadly as I mentioned in tensorflow/datasets/issues/4674, the download link from fast-depth doesn't work anymore hence couldn't verify if the error originated there or during porting data from there to HF.

Hi @sayakpaul, as you worked on huggingface/datasets/issues/5255, if you still have access to that data could you please share the data or perhaps checkout this issue?

Steps to reproduce the bug

This notebook from @sayakpaul could be used to generate depth maps and actual ground truths could be checked from this dataset from BTS repo.

Note: BTS dataset has only 36K data compared to the train-test 50K. They sampled the data as adjacent frames look quite the same

Expected behavior

Expected depth maps should be smooth rather than discrete/clipped.

Environment info

sayakpaul commented 1 year ago

Ccing @dwofk (the author of fast-depth).

Thanks, @awsaf49 for reporting this. I believe this is because the NYU Depth V2 shipped from fast-depth is already preprocessed.

If you think it might be better to have the NYU Depth V2 dataset from BTS here feel free to open a PR, I am happy to provide guidance :)

lhoestq commented 1 year ago

Good catch ! Ideally it would be nice to have the datasets in the raw form, this way users can choose whatever processing they want to apply

awsaf49 commented 1 year ago

Ccing @dwofk (the author of fast-depth).

Thanks, @awsaf49 for reporting this. I believe this is because the NYU Depth V2 shipped from fast-depth is already preprocessed.

If you think it might be better to have the NYU Depth V2 dataset from BTS here feel free to open a PR, I am happy to provide guidance :)

@sayakpaul I would love to create a PR on this. As this will be my first PR here, some guidance would be helpful.

Need a bit of advice on the dataset, there are three publicly available datasets. Which one should I consider for PR?

  1. BTS: Containst train/test: 36K/654 data, dtype = uint16 hence more precise
  2. DenseDepth It contains train/test: 50K/654 data, dtype = uint8 hence less precise
  3. Official: Size is big 400GB+, requires MatLab code for fixing projection and sync, DataType: pgm and dump hence can't be used directly.

cc: @lhoestq

sayakpaul commented 1 year ago

I think BTS. Repositories like https://github.com/vinvino02/GLPDepth usually use BTS. Also, just for clarity, the PR will be to https://huggingface.co/datasets/sayakpaul/nyu_depth_v2. Once we have worked it out, we can update the following things:

Don't worry about it if it seems overwhelming. We will work it out together :)

@lhoestq what do you think?

awsaf49 commented 1 year ago

@sayakpaul If I get this right I have to,

  1. Create a PR on https://huggingface.co/datasets/sayakpaul/nyu_depth_v2
  2. Create a PR on https://github.com/huggingface/blog
  3. Create a PR on https://github.com/huggingface/datasets to update https://github.com/huggingface/datasets/blob/main/docs/source/depth_estimation.mdx
sayakpaul commented 1 year ago

The last two are low-hanging fruits. Don't worry about them.

lhoestq commented 1 year ago

Yup opening a PR to use BTS on https://huggingface.co/datasets/sayakpaul/nyu_depth_v2 sounds good :) Thanks for the help !

awsaf49 commented 1 year ago

Finally, I have found the origin of the discretized depth map. When I first loaded the datasets from HF I noticed it was 30GB but in DenseDepth data is only 4GB with dtype=uint8. This means data from fast-depth (before loading to HF) must have high precision. So when I tried to dig deeper by directly loading depth_map from h5py, I found depth_map from h5py came with float32. But when the data is processed in HF with datasets.Image() it was directly converted to uint8 from float32 hence the discretized depth map. https://github.com/huggingface/datasets/blob/c78559cacbb0ca6e0bc8bfc313cc0359f8c23ead/src/datasets/features/image.py#L91-L93

Solutions:

1. Array2D

Use Array2D feature with float32 for depth_map

2. Uint16

Use uint16 as dtype for Image in _h5_loader for saving depth maps and accept uint16 dtype in datasets.Image() feature.

sayakpaul commented 1 year ago

Thanks so much for digging into this.

Since the second solution entails changes to core datatypes in datasets, I think it's better to go with the first solution.

@lhoestq WDYT?

awsaf49 commented 1 year ago

@sayakpaul Yes, Solution 1 requires minimal change and provides no precision loss. But I think support for uint16 image would be a great addition as many datasets come with uint16 image. For example UW-Madison GI Tract Image Segmentation dataset, here the image itself comes with uint16 dtype rather than mask. So, saving uint16 image with uint8 will result in precision loss.

Perhaps we can adapt solution 1 for this issue and Add support for uint16 image separately?

lhoestq commented 1 year ago

Using Array2D makes it not practical to use to train a model - in transformers we expect an image type.

There is a pull request to support more precision than uint8 in Image() here: https://github.com/huggingface/datasets/pull/5365/files

we can probably merge it today and do a release right away

sayakpaul commented 1 year ago

Fantastic, @lhoestq!

@awsaf49 then let's wait for the PR to get merged and then take the next steps?

awsaf49 commented 1 year ago

Sure

lhoestq commented 1 year ago

The PR adds support for uint16 which is ok for BTS if I understand correctly, would it be ok for you ?

sayakpaul commented 1 year ago

If the main issue with the current version of NYU we have on the Hub is related to the precision loss stemming from Image(), I'd prefer if Image() supported float32 as well.

awsaf49 commented 1 year ago

I also prefer float32 as it offers more precision. But I'm not sure if we'll be able to visualize image with float32 precision.

lhoestq commented 1 year ago

We could have a separate loading for the float32 one using Array2D, but I feel like it's less convenient to use due to the amount of disk space and because it's not an Image() type. That's why I think uint16 is a better solution for users

awsaf49 commented 1 year ago

A bit confused here, If https://github.com/huggingface/datasets/pull/5365 gets merged won't this issue will be resolved automatically?

lhoestq commented 1 year ago

Yes in theory :)

lhoestq commented 1 year ago

actually float32 also seems to work in this PR (it just doesn't work for multi-channel)

awsaf49 commented 1 year ago

In that case, a new PR isn't necessary, right?

lhoestq commented 1 year ago

Yep. I just tested from the PR and it works:

>>> train_dataset = load_dataset("sayakpaul/nyu_depth_v2", split="train", streaming=True)                                                         
Downloading readme: 100%|██████████████████| 8.71k/8.71k [00:00<00:00, 3.60MB/s]
>>> next(iter(train_dataset))
{'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=640x480 at 0x1382ED7F0>,
 'depth_map': <PIL.TiffImagePlugin.TiffImageFile image mode=F size=640x480 at 0x1382EDF28>}
>>> x = next(iter(train_dataset))
>>> np.asarray(x["depth_map"])                                             
array([[0.       , 0.       , 0.       , ..., 0.       , 0.       ,
        0.       ],
       [0.       , 0.       , 0.       , ..., 0.       , 0.       ,
        0.       ],
       [0.       , 0.       , 0.       , ..., 0.       , 0.       ,
        0.       ],
       ...,
       [0.       , 2.2861192, 2.2861192, ..., 2.234162 , 2.234162 ,
        0.       ],
       [0.       , 2.2861192, 2.2861192, ..., 2.234162 , 2.234162 ,
        0.       ],
       [0.       , 2.2861192, 2.2861192, ..., 2.234162 , 2.234162 ,
        0.       ]], dtype=float32)
awsaf49 commented 1 year ago

Great! the case is closed! This issue has been solved and I have to say, it was quite the thrill ride. I felt like Sherlock Holmes, solving a mystery and finding the bug🕵️‍♂️. But in all seriousness, it was a pleasure working on this issue and I'm glad we could get to the bottom of it.

On another note, should I consider closing the issue? I think we still need to make updates on https://github.com/huggingface/blog and https://github.com/huggingface/datasets/blob/main/docs/source/depth_estimation.mdx

lhoestq commented 1 year ago

Haha thanks Mr Holmes :p

maybe let's close this issue when we're done updating the blog post and the documentation

sayakpaul commented 1 year ago

@awsaf49 thank you for your hard work!

I am a little unsure why the other links need to be updated, though. They all rely on datasets internally.

awsaf49 commented 1 year ago

I think depth_map still shows discretized version. It would be nice to have corrected one.

awsaf49 commented 1 year ago

Also, I think we need to make some changes in the code to visualize depth_map as it is float32 . plot.imshow() supports either [0, 1] + float32 or [0. 255] + uint8

sayakpaul commented 1 year ago

Oh yes! Do you want to start with the fixes? Please feel free to say no but I wanted to make sure your contributions are reflected properly in our doc and the blog :)

awsaf49 commented 1 year ago

Yes I think that would be nice :)

awsaf49 commented 1 year ago

I'll make the changes tomorrow. I hope it's okay...

sayakpaul commented 1 year ago

Totally fine! No rush. Thanks again for your hard work!

awsaf49 commented 1 year ago

Just a little update from me, As from the new release, the image is with float32 it is taking more memory. The same code is throwing errors in Colab and Kaggle after consuming all ram/disk. In Kaggle it consumed nearly 80GB of disk and 30GB of ram. I may need to take it to AWS or GCP.

awsaf49 commented 1 year ago

I can do a quick workaround by streaming=True similar to @lhoestq. But then indexing won't be possible. Indexing is used in random image selection for visualization.

Also, I was hoping I could also add an overlay of image-depth as it would create a better perception. Additionally, use jet colormap as most literature use it. Something like this, image

sayakpaul commented 1 year ago

Also, I was hoping I could also add an overlay of image-depth as it would create a better perception. Additionally, use jet colormap as most literature use it. Something like this,

Thanks for your suggestions! One of the main purposes of the guides is to keep the length minimal while maintaining completeness. So, I'd prefer to keep the code changes to a bare minimum. Does that make sense?

awsaf49 commented 1 year ago

Once we have worked it out, we can update the following things:

* [Add a post for NYU Depth V2 in 🤗 Datasets blog#718](https://github.com/huggingface/blog/pull/718)

* https://huggingface.co/docs/datasets/main/en/depth_estimation

Don't worry about it if it seems overwhelming. We will work it out together :)

@sayakpaul Regarding huggingface/blog/pull/718, it seems PR has not been merged yet.

sayakpaul commented 1 year ago

@sayakpaul Regarding huggingface/blog/pull/718, it seems PR has not been merged yet.

That is correct, let's wait for that one.

awsaf49 commented 1 year ago

Okay =)