huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
176 stars 51 forks source link

caching diffusion models does not work #593

Closed oOraph closed 1 month ago

oOraph commented 1 month ago

System Info

optimum neuron 0.21.0
ii  aws-neuronx-collectives           2.20.22.0-c101c322e                     amd64        neuron_ccom built using CMake
ii  aws-neuronx-dkms                  2.16.7.0                                amd64        aws-neuronx driver in DKMS format.
ii  aws-neuronx-runtime-lib           2.20.22.0-1b3ca6425                     amd64        neuron_runtime built using CMake
ii  aws-neuronx-tools                 2.17.1.0                                amd64        Neuron profile and debug tools

Who can help?

@JingyaHuang @oOraph

Information

Tasks

Reproduction (minimal, reproducible, runnable)

run the minimal script twice:

from optimum.neuron import NeuronStableDiffusionPipeline
model_id = "runwayml/stable-diffusion-v1-5"
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **compiler_args, **input_shapes)

The model get recompiled and exported twice

Logging details:

Lookup:

neuronxcc-2.13.66.0+6dfecc895/MODULE_500c5bd266330f2a15fa

Export:

neuronxcc-2.13.66.0+6dfecc895/MODULE_323b66a01f7f04552356

Troubleshoot:

there is a mismatch between the hash checked when looking for cached exports and the hash computed after building an export. Consequently, the model is never found in cache

the problem arise because there is a diff between cache configs here (lookup):

https://github.com/huggingface/optimum-neuron/blob/7e21931b6d74d21997ee5f5ea8742ab7da977e29/optimum/neuron/modeling_diffusion.py#L752

and here (export):

https://github.com/huggingface/optimum-neuron/blob/7e21931b6d74d21997ee5f5ea8742ab7da977e29/optimum/exporters/neuron/convert.py#L410

The diff is really minimal:

85c85
          "static_height" : 128,
---
          "static_height" : 16,
88c88
          "static_width" : 128
---
          "static_width" : 16

in unet config

Expected behavior

Exporting once locally should be enough, model should not be recompiled everytime

oOraph commented 1 month ago

looks like original requested image size (unet input shape), 1024 is actually rescaled twice (eg divided by vae scale factor ** 2 instead of vae_scale_factor)!

first one when looking for the cache entry -> 1024 // 8 = 128 but then when exporting the model the input shape is rescaled a second time: 128 // 8 = 16

this is due to the fact that this call:

https://github.com/huggingface/optimum-neuron/blob/7e21931b6d74d21997ee5f5ea8742ab7da977e29/optimum/neuron/modeling_diffusion.py#L701

actually change the input_shapes dict (rescale unet input shapes) but this very same dict is reused here https://github.com/huggingface/optimum-neuron/blob/7e21931b6d74d21997ee5f5ea8742ab7da977e29/optimum/neuron/modeling_diffusion.py#L770

so using an input_shapes deep copy in the first place should fix the issue

oOraph commented 1 month ago

PR https://github.com/huggingface/optimum-neuron/pull/594

oOraph commented 1 month ago

side note, the script to reproduce with 0.22.0 is

from optimum.neuron import NeuronStableDiffusionPipeline
model_id = "runwayml/stable-diffusion-v1-5"
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16", "inline_weights_to_neff": False}
input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **compiler_args, **input_shapes)

as the default value for inline_weights_to_neff changed between versions :) (for a performance drop reason that is still not solved if I understood correctly)