Mask2FormerImageProcessor - fails to process multichannel image

maciej-adamiak commented 1 week ago

System Info

transformers version: 4.43.4
Platform: Linux-6.5.0-1027-oem-x86_64-with-glibc2.35
Python version: 3.11.9
Huggingface_hub version: 0.24.6
Safetensors version: 0.4.4
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.0 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: no
Using GPU in script?: yes
GPU type: NVIDIA RTX A4500 Laptop GPU

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Colab

Expected behavior

The image processing should be handled correctly.

It looks like the get_max_height_width is using a default parameter rather then a value input_data_format set in the class constructor.

LysandreJik commented 1 week ago

cc @qubvel or @zucchini-nlp in case you have some bandwidth!

qubvel commented 1 week ago

Hi @maciej-adamiak , thanks for opening an issue and providing an example to reproduce!

Would you like to make a fix? We should make sure that input_data_format is passed to get_max_height_width and all other transforms correctly handle it.

maciej-adamiak commented 1 week ago

I'm on it.

huggingface / transformers