Incorrect output when using accelerate in a pytorch Unet model

System Info

Good morning! I'm trying to use accelerate to distribute an unet model that has already been trained. I need to do model and tensor parallelism because a single image does not fit on a single GPU, so when performing the inference with a single GPU I get a memory error.

I load the .pth file like:

model = UNet(3,1)
model = model.to(memory_format=torch.channels_last)
state_dict = torch.load("model.pth", map_location="cpu")
del state_dict['mask_values']
model.load_state_dict(state_dict);
model.eval()

After, I use:

model = prepare_pippy(model, example_args=(input,))

Then, I try to make an inference like this:

with torch.no_grad():
 output = model(input)

I have 2 GPUs, and I do notice that when I run it, the load is distributed on both GPUs.

The model I use segments an image, and should return me a segmentation, however the output is completely wrong, it returns me an image with completely random blank pixels.

The model run on a single GPU for a small image gets the correct result.

If you have any idea how to solve it, please tell me.

Information

[ ] The official example scripts
[x] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[x] My own task or dataset (give details below)

Reproduction

model = UNet(3,1)
model = model.to(memory_format=torch.channels_last)
state_dict = torch.load("model.pth", map_location="cpu")
del state_dict['mask_values']
model.load_state_dict(state_dict);
model.eval()

model = prepare_pippy(model, example_args=(input,))

with torch.no_grad():
 output = model(input)

Expected behavior

A correctly segmented image

huggingface / accelerate