Predicted depth map incorrectly rendered as an image

pkubiak commented 5 days ago

System Info

transformers version: 4.45.2
Platform: macOS-14.6-arm64-arm-64bit
Python version: 3.9.6
Huggingface_hub version: 0.25.2
Safetensors version: 0.4.5
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: NO

Who can help?

Hi, In a depth-estimation pipeline, there is a code part that interpolate model predicted depth map to the size of original image and than render it as a pillow image: https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/depth_estimation.py#L118-L122

Unfortunately, it does not take into account the fact that in some cases the value of bicubic interpolation may lie outside the range of the original nodal values. In this case it means that sometimes we can get a negative depth values.

tensor = torch.tensor([[[[0.8091, 0.1907], [0.0787, 0.9970]]]])
interpolated = torch.nn.functional.interpolate(tensor, (3, 3), mode="bicubic", align_corners=False)

print(interpolated.min() < tensor.min())
print(interpolated.min() < 0.0)

Such situation is not handle properly on line 122, which result in int8 overflow and strange edge artefacts in generated images.

Error is model independent (tested ondepth-anything/Depth-Anything-V2-Small-hf, depth-anything/Depth-Anything-V2-Base-hf and Intel/dpt-large).

@amyeroberts @qubvel @Rocketknight1

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

from transformers import pipeline

pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Base-hf")

pipe("https://a.allegroimg.com/original/11a694/019efb9b461c89a803be5c94f57d/LEGO-Marvel-Hulkbuster-76210-Iron-Man-Numer-produktu-76210")["depth"].show()

Noticeable white border:

Expected behavior

I would like to get a depth map image without strange edge artefacts :)

Something like the following picture (effect of changing bicubic to bilinear interpolation):

Aryan8912 commented 5 days ago

.take

Rocketknight1 commented 5 days ago

@pkubiak yes, great issue and a clear bug. The obvious solution is either to switch to bilinear interpolation, or just to clamp the values to (0, 255) before we do the uint8 conversion. My preference is for clamping - give me a sec and I'll open the PR.

Rocketknight1 commented 5 days ago

Wow - I had no idea bicubic was this bad. In your example, when we bicubic interpolate a 518x518 input image up to 1500x1500, the range of values goes from (0, 9.48) to (-0.79, 9.6) ! I expected little rounding errors but this is huge, and means that clamping might lose some information.

Rocketknight1 commented 4 days ago

Talked about this with @rwightman and the conclusion was that clamping the post-interpolation distribution to the original range before rescaling + uint8 conversion is correct. @pkubiak would be you be willing to make a PR for that? If you don't have time, let me know and I'll put it on my list, but thank you for the clear and helpful issue either way!

pkubiak commented 4 days ago

@Rocketknight1: I see that in the meantime some other PR has fixed this problem, but in a different way than in your proposal: https://github.com/huggingface/transformers/pull/32550 🤷 It also introduced some breaking changes 🤷

Rocketknight1 commented 3 days ago

@pkubiak understood! If you have any issues after that PR, can you raise them on that PR page and ping the authors?

huggingface / transformers