device_map="auto" leads to `Expected all tensors to be on the same device` error on generate call

System Info

- `Accelerate` version: 0.27.0
- Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
- Python version: 3.11.5
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.0+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 503.75 GB
- GPU type: NVIDIA RTX A6000
- `Accelerate` default config:
    - compute_environment: LOCAL_MACHINE
    - distributed_type: MULTI_GPU
    - mixed_precision: no
    - use_cpu: False
    - debug: False
    - num_processes: 4
    - machine_rank: 0
    - num_machines: 1
    - gpu_ids: all
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

I think the following code below should do

# most code below taken from https://huggingface.co/Salesforce/instructblip-vicuna-7b#intended-uses--limitations

from transformers import InstructBlipProcessor, InstructBlipForConditionalGeneration
import torch
from PIL import Image
import requests

model = InstructBlipForConditionalGeneration.from_pretrained("Salesforce/instructblip-vicuna-7b", device_map="auto")
processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-vicuna-7b")

url = "https://raw.githubusercontent.com/salesforce/LAVIS/main/docs/_static/Confusing-Pictures.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
prompt = "What is unusual about this image?"

# as per https://huggingface.co/docs/accelerate/en/concept_guides/big_model_inference#the-devicemap
inputs = processor(images=image, text=prompt, return_tensors="pt").to(0)

outputs = model.generate(
        **inputs,
        do_sample=False,
        num_beams=5,
        max_length=256,
        min_length=1,
        top_p=0.9,
        repetition_penalty=1.5,
        length_penalty=1.0,
        temperature=1,
)
generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0].strip()
print(generated_text)

Expected behavior

I expected the code to run without bugs, but I instead I get RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0! (when checking argument for argument index in method wrapper_CUDA_gather).

huggingface / accelerate

device_map="auto" leads to `Expected all tensors to be on the same device` error on generate call #2847

System Info

Information

Tasks

Reproduction

Expected behavior