'DepthEstimationPipeline' object has no attribute 'image_size' when num_workers > 0

System Info

OS: Linux (Ubuntu 22) transformers: 4.44.2 torch: 2.3.1

Who can help?

@Narsil

Information

[ ] The official example scripts
[x] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

See code snippet below to reproduce:


from transformers import pipeline
import torch

pipe = pipeline(model="LiheYoung/depth-anything-base-hf", device=0, num_workers=1)

# To reproduce, the input needs to be an array of images, and it can just be 1 image in the array.
# A single image not placed in an array will not reproduce the issue.
pipe(["<YOUR IMAGE>"])

This results in:

File "[...]/site-packages/transformers/pipelines/depth_estimation.py", line 109, in postprocess
    predicted_depth.unsqueeze(1), size=self.image_size[::-1], mode="bicubic", align_corners=False
AttributeError: 'DepthEstimationPipeline' object has no attribute 'image_size'

Upon investigation, you will see that self.image_size is set in preprocess() which is executed by one of the workers while (I believe) postprocess() is executed on the main process. Because these 2 processes do not share the same class instance of DepthEstimationPipeline, self.image_size ends up never being defined in the main processes's address space. This makes the DepthEstimationPipeline only work when preprocess() and postprocess() are executed on the same process when num_workers=0.

A quick fix would be for DepthEstimationPipeline to accept an image_size argument which can be passed to preprocess_params --> preprocess(). Then the user can create a pipeline by doing

pipe = pipeline(model="LiheYoung/depth-anything-base-hf", device=0, num_workers=1, image_size=(256,256))

for example. Of course, it's not as clean since it doesn't dynamically infer the image sizes from the user's input, but it should help. Any other ideas?

Expected behavior

For DepthEstimationPipeline.__call__() to execute successfully with multiprocessing.

huggingface / transformers