Intel/dpt-swinv2-tiny-256: TypeError: unsupported operand type(s) for //: 'NoneType' and 'NoneType'

yurithefury commented 5 months ago

System Info

transformers version: 4.41.2
Platform: Windows-10-10.0.22631-SP0
Python version: 3.10.13
Huggingface_hub version: 0.23.2
Safetensors version: 0.4.2
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.0.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@amyeroberts

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

from transformers import AutoImageProcessor, DPTForSemanticSegmentation
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("Intel/dpt-swinv2-tiny-256")
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-swinv2-tiny-256")

inputs = image_processor(images=image, return_tensors="pt")

outputs = model(**inputs)
logits = outputs.logits

Error message:

Traceback (most recent call last):
  File "c:\Yuri\Repos\image-segmenter\huggingface.py", line 20, in <module>
    model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-swinv2-tiny-256")
  File "C:\Users\gis-user\miniconda3\envs\sem_seg\lib\site-packages\transformers\modeling_utils.py", line 3626, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "C:\Users\gis-user\miniconda3\envs\sem_seg\lib\site-packages\transformers\models\dpt\modeling_dpt.py", line 1260, in __init__
    self.dpt = DPTModel(config, add_pooling_layer=False)
  File "C:\Users\gis-user\miniconda3\envs\sem_seg\lib\site-packages\transformers\models\dpt\modeling_dpt.py", line 873, in __init__
    self.embeddings = DPTViTEmbeddings(config)
  File "C:\Users\gis-user\miniconda3\envs\sem_seg\lib\site-packages\transformers\models\dpt\modeling_dpt.py", line 220, in __init__
    self.patch_embeddings = DPTViTPatchEmbeddings(config)
  File "C:\Users\gis-user\miniconda3\envs\sem_seg\lib\site-packages\transformers\models\dpt\modeling_dpt.py", line 281, in __init__
    num_patches = (image_size[1] // patch_size[1]) * (image_size[0] // patch_size[0])
TypeError: unsupported operand type(s) for //: 'NoneType' and 'NoneType'

Expected behavior

I want to test Intel/dpt-swinv2-tiny-256 models. Model Page: https://huggingface.co/Intel/dpt-swinv2-tiny-256 Example code is from: https://github.com/huggingface/transformers/blob/main/src/transformers/models/dpt/modeling_dpt.py

qubvel commented 5 months ago

Hi @yurithefury, thanks for the issue, I will have a look!

RUFFY-369 commented 5 months ago

Hi @yurithefury, I think as the image_size and patch_size both are none in the config.json file of this model and are accessed directly with config.image_size in the modeling file and the same happens with other attributes too, to name a few, num_hidden_layers, num_attention_heads,..and their value also goes to None here in configuration_dpt too as use_autobackbone becomes True when backbone_config gets loaded.

qubvel commented 5 months ago

It seems like DPTForSemanticSegmentation does not work with "backbone_config", however, DPTForDepthEstimation correctly loads this model.

RUFFY-369 commented 5 months ago

It seems like DPTForSemanticSegmentation does not work with "backbone_config", however, DPTForDepthEstimation correctly loads this model.

Yeah because DPTModel instance doesn't gets created and only the backbone_config is used which avoids the control to go to this line: https://github.com/huggingface/transformers/blob/4a6024921fa142f28e8d0034ae28693713b3bfd0/src/transformers/models/dpt/modeling_dpt.py#L275 which causes the error in the DPTForSemanticSegmentation as the values get None

amyeroberts commented 5 months ago

@qubvel @RUFFY-369 Indeed - the control flow for the DPT config isn't ideal - the hybrid / non-hybrid logic in the architecture should have been implemented as two separate models as we now how unexpected behaviours, which also lead to issues like in #30633 #28292.

With regards to the backbone config - I'm currently updating the modeling file and config, to try and make this possible #31145

cc @NielsRogge

yurithefury commented 5 months ago

Thank you for looking into this. BeitForSemanticSegmentation and Data2VecVisionForSemanticSegmentation break with similar errors. Is it worth opening separate issues?

amyeroberts commented 5 months ago

@yurithefury Yes please, as those models don't use a backbone the fix will be slightly different. This will help us better track once it's resolved. Thank you!

NielsRogge commented 5 months ago

Hi @yurithefury,

The dpt-swinv2-tiny-256 model needs to be loaded with DPTForDepthEstimation, not DPTForSemanticSegmentation. See the architecture here: https://huggingface.co/Intel/dpt-swinv2-tiny-256/blob/main/config.json#L5.

If you want to use DPT for semantic segmentation, this is a compatible checkpoint: https://huggingface.co/Intel/dpt-large-ade

amyeroberts commented 5 months ago

@NielsRogge It might be that a checkpoint is intended for a specific task, however we should be able to correctly load it in any of the model classes i.e. DPTModel, DPTForSemanticSegmentation, DPTForDepthEstimation etc.

yurithefury commented 5 months ago

Thanks for clarifying @NielsRogge! I blindly assumed that all Intel/dpt models are suitable for semantic segmentation :facepalm:. Gotta pay more attention to task types when filtering through models.

huggingface / transformers