Closed oOraph closed 1 month ago
looks like original requested image size (unet input shape), 1024 is actually rescaled twice (eg divided by vae scale factor ** 2 instead of vae_scale_factor)!
first one when looking for the cache entry -> 1024 // 8 = 128 but then when exporting the model the input shape is rescaled a second time: 128 // 8 = 16
this is due to the fact that this call:
actually change the input_shapes dict (rescale unet input shapes) but this very same dict is reused here https://github.com/huggingface/optimum-neuron/blob/7e21931b6d74d21997ee5f5ea8742ab7da977e29/optimum/neuron/modeling_diffusion.py#L770
so using an input_shapes deep copy in the first place should fix the issue
side note, the script to reproduce with 0.22.0 is
from optimum.neuron import NeuronStableDiffusionPipeline
model_id = "runwayml/stable-diffusion-v1-5"
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16", "inline_weights_to_neff": False}
input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(model_id, export=True, **compiler_args, **input_shapes)
as the default value for inline_weights_to_neff changed between versions :) (for a performance drop reason that is still not solved if I understood correctly)
System Info
Who can help?
@JingyaHuang @oOraph
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
run the minimal script twice:
The model get recompiled and exported twice
Logging details:
Lookup:
Export:
Troubleshoot:
there is a mismatch between the hash checked when looking for cached exports and the hash computed after building an export. Consequently, the model is never found in cache
the problem arise because there is a diff between cache configs here (lookup):
https://github.com/huggingface/optimum-neuron/blob/7e21931b6d74d21997ee5f5ea8742ab7da977e29/optimum/neuron/modeling_diffusion.py#L752
and here (export):
https://github.com/huggingface/optimum-neuron/blob/7e21931b6d74d21997ee5f5ea8742ab7da977e29/optimum/exporters/neuron/convert.py#L410
The diff is really minimal:
in unet config
Expected behavior
Exporting once locally should be enough, model should not be recompiled everytime