huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
211 stars 63 forks source link

Error running SDXL Turbo on both CPU and inf2 #364

Closed mmcclean-aws closed 11 months ago

mmcclean-aws commented 12 months ago

Am trying to run the SDXL model with Optimum Neuron and inf2 and getting the following error when compiling the UNET portion:

2023-12-03 22:21:00.337954: F /opt/workspace/KaenaTorchXlaWheels/build/private/pytorch/xla/third_party/tensorflow/bazel-tensorflow/tensorflow/compiler/xla/xla_client/debug_macros.h:27] Non-OK-status: status.status() status: INVALID_ARGUMENT: Cannot infer shape for dot operation: f32[1,2304] <dot> f32[2816,1280]. Contracting dimension sizes do not match.

I also tried to run the dummy trace input through the CPU version and getting a similar error below:

An error occured when trying to trace unet with the error message: mat1 and mat2 shapes cannot be multiplied (1x2304 and 2816x1280)

It seems there is a problem with the model code.

My setup is the following:

(aws_neuron_venv_pytorch) ubuntu@ip-172-31-50-175:~/sd$ pip list | grep 'optimum\|transformers\|neuron'
aws-neuronx-runtime-discovery 2.9
libneuronxla                  0.5.538
neuronperf                    1.8.9.0+5dccae385
neuronx-cc                    2.11.0.34+c5231f848
neuronx-distributed           0.5.0
neuronx-hwm                   2.11.0.2+e34678757
optimum                       1.14.1
optimum-neuron                0.0.15
torch-neuronx                 1.13.1.1.12.0
torch-xla                     1.13.1+torchneuronc
transformers                  4.35.0
transformers-neuronx          0.8.268
mmcclean-aws commented 12 months ago

Tried to also upgrade to transformers==4.35.2 and diffusers==0.24.0 libs with no luck.

mmcclean-aws commented 11 months ago

Switching from NeuronStableDiffusionPipeline to NeuronStableDiffusionXLPipeline makes compilation work however now I see the following error at the unet stage:

[Compilation Time] 348.65 seconds.
[Total compilation Time] 757.32 seconds.
Traceback (most recent call last):
  File "sdxl-turbo.py", line 9, in <module>
    stable_diffusion = NeuronStableDiffusionXLPipeline.from_pretrained(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/optimum/modeling_base.py", line 372, in from_pretrained
    return from_pretrained_method(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/optimum/neuron/modeling_diffusion.py", line 637, in _from_transformers
    return cls._from_pretrained(
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/optimum/neuron/modeling_diffusion.py", line 492, in _from_pretrained
    data_parallel_mode = cls.set_default_dp_mode(configs["unet"])
KeyError: 'unet'
JingyaHuang commented 11 months ago

Hi @mmcclean-aws ,

Yes for sdxl-turbo, we shall use the sdxl pipeline instead of the sd pipeline.

I just tested with stabilityai/sdxl-turbo, it should be compatible with the sdxl exporter and pipeline that we have put in place in Optimum Neuron.

optimum-cli export neuron --model stabilityai/sdxl-turbo --task stable-diffusion-xl --batch_size 1 --height 512 --width 512 --auto_cast matmul --auto_cast_type bf16 sdxl_turbo_neuron/

You can find compiled artifacts here: Jingya/sdxl-turbo-neuronx

# [Inference]
repo_id = "Jingya/sdxl-turbo-neuronx"
pipe = NeuronStableDiffusionXLPipeline.from_pretrained(repo_id, data_parallel_mode="all")

prompt = ["Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"]*2

images = pipe(prompt=prompt, guidance_scale=0.0, num_inference_steps=1).images

for i in range(5):
    start_time = time.time()
    images = pipe(prompt=prompt, guidance_scale=0.0, num_inference_steps=1).images
    inf_time = time.time() - start_time
    print(f"[Inference Time] {np.round(inf_time, 2)} seconds.")
    images[0].save(f"image_{i}.png")

print(f"Generated {len(images)} images.")

image

image

Can you send me your code so that I can try to reproduce the issue that you have seen? Thanks!

Mystorius commented 11 months ago

Hi @JingyaHuang, thank you for providing the compiled model. However when I try to run it on my AWS EC2 Infra2.8xlarge, I get the following error.

exporter_config constructor = TasksManager.SUPPORTED MODEL_TYPE[model_type][exporter][task] KeyError: 'clip_text_model'

The key "clip_text_model" is used your config.json of the two text_encoders. Do you know how I can fix this error? I also saw that you complied the models using transformer_version: "4.36.0.dev0" which I do not have access to (I did not find a release for that in github).

Any help would be appreciated :)

JingyaHuang commented 11 months ago

Hi @Mystorius,

The version of transformers won't be an issue here, it will be ok if you use 4.35.2 release instead.

For the error you met, it seems that clip text model hasn't been correctly registered in your environment: https://github.com/huggingface/optimum-neuron/blob/aabcedb9117d300994a77a2290c53c52b2671421/optimum/exporters/neuron/model_configs.py#L218-L220

Could you try uninstall and re-install optimum-neuron and try again:

pip uninstall optimum-neuron
pip install optimum-neuron

If it still doesn't work, could you offer me the optimum and optimum-neuron versions along with a minimal script so that I can try to reproduce it? Thx!

JingyaHuang commented 11 months ago

Close as the issue is solved with above script. Associated snippets are updated in optimum neuron doc as well.

Mystorius commented 11 months ago

@JingyaHuang Hi sorry for the late reply. Everything is now working for us, thanks. Re-installing optimum-neuron helped.