Can't prepare model after Quanto is applied on DistributedDataParallel

bghira commented 2 months ago

System Info

Accelerate version: 0.31.0
Platform: macOS-15.0-arm64-arm-64bit
accelerate bash location: /Users/bghira/src/SimpleTuner/.venv/bin/accelerate
Python version: 3.10.14
Numpy version: 1.26.0
PyTorch version (GPU?): 2.4.0 (False)
PyTorch XPU available: False
PyTorch NPU available: False
PyTorch MLU available: False
System RAM: 128.00 GB
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: fp16
- use_cpu: False
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

import torch, accelerate
from diffusers import FluxTransformer2DModel
from optimum.quanto import quantize, qint8, freeze
weight_dtype = torch.bfloat16

accelerator = accelerate.Accelerator()

bfl_model = 'black-forest-labs/FLUX.1-dev'
transformer = FluxTransformer2DModel.from_pretrained(bfl_model, torch_dtype=torch.bfloat16, subfolder="transformer")

# you might need 'with accelerator.main_process_first()' if your server lacks system mem
print('quantizing')
quantize(transformer, qint8)
print('freezing')
freeze(transformer)

tpacked_noisy_latents = torch.randn(1, 1024, 64,dtype=weight_dtype, device=accelerator.device)
tpooled_projections = torch.randn(1, 768,dtype=weight_dtype, device=accelerator.device)
ttimesteps = torch.randn(1,dtype=weight_dtype, device=accelerator.device)
tguidance = torch.randn(1,dtype=weight_dtype, device=accelerator.device)
tencoder_hidden_states = torch.randn(1, 512, 4096,dtype=weight_dtype, device=accelerator.device)
ttxt_ids = torch.randn(1, 512, 3,dtype=weight_dtype, device=accelerator.device)
timg_ids = torch.randn(1, 4320, 3,dtype=weight_dtype, device=accelerator.device)

#with torch.no_grad():
#    model_pred = transformer(
#        hidden_states=tpacked_noisy_latents,
#        timestep=ttimesteps,
#        guidance=tguidance,
#        pooled_projections=tpooled_projections,
#        encoder_hidden_states=tencoder_hidden_states,
#        txt_ids=ttxt_ids,
#        img_ids=timg_ids,
#        joint_attention_kwargs=None,
#        return_dict=False,
#    )
transformer = accelerator.prepare(transformer)

Shows that there are uninitialised, empty parameters.

Expected behavior

The model should be prepared.

bghira commented 2 months ago

cc @sayakpaul and @muellerzr

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

bghira commented 1 month ago

now seems to work on accelerate v0.34.2 and diffusers v0.30.3

huggingface / accelerate