sd 1.5 inpainting model inference

System Info

Trying to export a SD 1.5 inpainting model for inf2 as per the instructions via the python API I get this error:

2024-02-11T22:30:43Z Running InferPSumTensor
2024-02-11T22:30:44Z InferPSumTensor finished after 0.726 seconds
2024-02-11T22:30:44Z Running WeightCoalescing
2024-02-11T22:30:44Z WeightCoalescing finished after 0.038 seconds
2024-02-11T22:30:44Z Running LegalizeSundaAccess
2024-02-11T22:30:44Z LegalizeSundaAccess finished after 0.152 seconds
2024-02-11T22:30:44Z Running RelaxPredicates
2024-02-11T22:30:44Z RelaxPredicates finished after 0.036 seconds
2024-02-11T22:30:44Z Running TensorInitialization
2024-02-11T22:30:44Z TensorInitialization finished after 0.331 seconds
2024-02-11T22:30:44Z Running TongaSimplifyPredicates
2024-02-11T22:30:44Z TongaSimplifyPredicates finished after 0.230 seconds
2024-02-11T22:30:44Z Running ExpandISAMacro
2024-02-11T22:30:44Z ExpandISAMacro finished after 0.029 seconds
2024-02-11T22:30:44Z Running SimplifyTongaTensor
2024-02-11T22:30:44Z SimplifyTongaTensor finished after 0.106 seconds
2024-02-11T22:30:44Z Running DMALocalityOpt
2024-02-11T22:30:44Z DMALocalityOpt finished after 0.010 seconds
2024-02-11T22:30:44Z Running DataStreaming
2024-02-11T22:30:45Z DataStreaming finished after 0.086 seconds
2024-02-11T22:30:45Z Running SFKVectorizer
2024-02-11T22:30:48Z SFKVectorizer finished after 3.312 seconds
2024-02-11T22:30:48Z Running LateLegalizeInst
2024-02-11T22:30:48Z LateLegalizeInst finished after 0.126 seconds
2024-02-11T22:30:48Z Running CoalesceCCOp
2024-02-11T22:30:48Z CoalesceCCOp finished after 0.031 seconds
2024-02-11T22:30:48Z Running SimpleAllReduceTiling
2024-02-11T22:30:48Z SimpleAllReduceTiling finished after 0.030 seconds
2024-02-11T22:30:48Z Running StaticProfiler
2024-02-11T22:30:48Z StaticProfiler finished after 0.073 seconds
2024-02-11T22:30:48Z Running SplitAPUnionSets
2024-02-11T22:30:49Z SplitAPUnionSets finished after 0.492 seconds
2024-02-11T22:30:49Z Running DumpGraphAndMetadata
2024-02-11T22:30:49Z DumpGraphAndMetadata finished after 0.074 seconds
2024-02-11T22:30:49Z Running BirCodeGenLoop
2024-02-11T22:30:52Z BirCodeGenLoop finished after 3.391 seconds
2024-02-11T22:47:19Z Compiler status PASS
[Compilation Time] 1034.13 seconds.
[Total compilation Time] 1526.98 seconds.
Traceback (most recent call last):
  File "/app/test.py", line 11, in <module>
    pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
  File "/usr/local/lib/python3.10/site-packages/optimum/modeling_base.py", line 372, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 530, in _from_transformers
    return cls._export(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 659, in _export
    return cls._from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 494, in _from_pretrained
    data_parallel_mode = cls.set_default_dp_mode(configs["unet"])
KeyError: 'unet'

from optimum.neuron import NeuronStableDiffusionInpaintPipeline

size = 1024
model_id = "runwayml/stable-diffusion-inpainting"
input_shapes = {"batch_size": 1, "height": size, "width": size}
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
pipeline.save_pretrained("sd-v1-5-inpainting-neuronized/")

The deps are from this docker container:

FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference-neuronx:2.10.1-neuronx-py310-sdk2.14.1-ubuntu20.04

RUN python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com && \
    python -m pip install 'optimum[neuronx, diffusers]'

WORKDIR /app
ADD ./test.py ./

CMD ["python", "test.py"]

diffusers==0.26.2
torch==1.13.1
torch-neuronx==1.13.1.1.12.0
tensorflow-neuron==2.10.1.2.10.1.0
tensorflow-neuronx==2.10.1.2.1.0
neuronx-cc==2.11.0.34+c5231f848
neuronx-distributed==0.5.0
neuronx-hwm==2.11.0.2+e34678757

Also tried via the optimum CLI using this AMI: https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2 but I got an error:

Loading only U-Net into both Neuron Cores...
[W model.cpp:274] Warning: Model was compiled with a newer version of torch-neuron than the current runtime (function operator())
[W model.cpp:274] Warning: Model was compiled with a newer version of torch-neuron than the current runtime (function operator())
[W model.cpp:274] Warning: Model was compiled with a newer version of torch-neuron than the current runtime (function operator())
2024-Feb-12 05:25:26.0449    25:92    ERROR  NEFF:neff_parse                              NEFF version: 2.0, features: 0x100 are not supported.  Currently supporting: 0x80000000000000bf
2024-Feb-12 05:25:26.0449    25:92    ERROR  NMGR:kmgr_load_nn_post_metrics               Failed to load NN: /tmp/tmpeg2h88jn/graph.neff, err: 10
terminate called after throwing an instance of 'c10::Error'
  what():  Could not load the model status=10 message=Unsupported NEFF Version
Exception raised from NeuronModel at /opt/workspace/KaenaPyTorchRuntime/neuron_op/model.cpp:165 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f7587ff1457 in /usr/local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f7587fbb3ec in /usr/local/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: neuron::NeuronModel::NeuronModel(std::string const&, std::basic_string_view<char, std::char_traits<char> > const&, int, int, unsigned int, unsigned int) + 0x1b68 (0x7f74e6c2ae08 in /usr/local/lib/python3.10/site-packages/torch_neuronx/lib/libtorchneuron.so)
frame #3: neuron::Model::blocking_load() + 0x152 (0x7f74e6d18662 in /usr/local/lib/python3.10/site-packages/torch_neuronx/lib/libtorchneuron.so)
frame #4: std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::shared_ptr<neuron::NeuronModel> (neuron::Model::*)(), neuron::Model*> > >::_M_run() + 0x31 (0x7f74e6d1b801 in /usr/local/lib/python3.10/site-packages/torch_neuronx/lib/libtorchneuron.so)
frame #5: <unknown function> + 0xd6df4 (0x7f75e0ef8df4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #6: <unknown function> + 0x8609 (0x7f765105e609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #7: clone + 0x43 (0x7f7650e29133 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

compile commands

python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
pip install "optimum[neuronx, diffusers]"
optimum-cli export neuron --model runwayml/stable-diffusion-inpainting \
--task stable-diffusion \
--batch_size 1 \
--height 1024 `# height in pixels of generated image, eg. 512, 768` \
--width 1024 `# width in pixels of generated image, eg. 512, 768` \
--num_images_per_prompt 1 `# number of images to generate per prompt, defaults to 1` \
--auto_cast matmul `# cast only matrix multiplication operations` \
--auto_cast_type bf16 `# cast operations from FP32 to BF16` \
sd_neuron/



### Who can help?

@JingyaHuang 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

above

### Expected behavior

No errors when running inference

Hi @mikob, thanks for reporting the issue. There is currently an incompatibility between the latest optimum-neuron and diffusers > 0.26.0. The issue is fixed in #458, and we are planning for a patch release this week. Meanwhile you can either continue with optimum-neuron built from source or diffusers <0.26.* (it will take approximately an hour for the compilation).

Let me know if you have any further questions.

And for the second error, it seems that you are loading the compiled model with a newer version of torch-neuron, could you post the neuron SDK versions with the following command? I will try to reproduce it.

apt list --installed | grep aws-neuron
pip3 list | grep -e neuron -e tvm -e torch

@JingyaHuang thank you for the fast response! Recompiling now with diffusers 0.25.1, I'll let you know if it's successful.

And for the second error, it seems that you are loading the compiled model with a newer version of torch-neuron, could you post the neuron SDK versions with the following command? I will try to reproduce it.

I'm using the official AWS neuron docker image mentioned above from here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers, here is the output:

aws-neuronx-collectives/now 2.17.9.0-fb6d14044 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.17.7.0-df62e3f70 amd64 [installed,local]
aws-neuronx-tools/now 2.14.6.0 amd64 [installed,local]

root@e559460e2125:/app# pip3 list | grep -e neuron -e tvm -e torch
aws-neuronx-runtime-discovery 2.9
libneuronxla                  0.5.538
neuronx-cc                    2.11.0.34+c5231f848
neuronx-distributed           0.5.0
neuronx-hwm                   2.11.0.2+e34678757
optimum-neuron                0.0.13
tensorboard-plugin-neuron     2.4.6.0
tensorboard-plugin-neuronx    2.5.39.0
tensorflow-neuron             2.10.1.2.10.1.0
tensorflow-neuronx            2.10.1.2.1.0
torch                         1.13.1
torch-neuronx                 1.13.1.1.12.0
torch-xla                     1.13.1+torchneuronc
torchvision                   0.14.1
transformers-neuronx          0.8.268

@JingyaHuang downgraded to diffusers 0.25.1 and still got this error when compiling:

2024-02-12T16:15:38Z TongaLICM finished after 0.129 seconds
2024-02-12T16:15:38Z Running InferPSumTensor
2024-02-12T16:15:39Z InferPSumTensor finished after 0.715 seconds
2024-02-12T16:15:39Z Running WeightCoalescing
2024-02-12T16:15:39Z WeightCoalescing finished after 0.038 seconds
2024-02-12T16:15:39Z Running LegalizeSundaAccess
2024-02-12T16:15:39Z LegalizeSundaAccess finished after 0.156 seconds
2024-02-12T16:15:39Z Running RelaxPredicates
2024-02-12T16:15:39Z RelaxPredicates finished after 0.037 seconds
2024-02-12T16:15:39Z Running TensorInitialization
2024-02-12T16:15:40Z TensorInitialization finished after 0.345 seconds
2024-02-12T16:15:40Z Running TongaSimplifyPredicates
2024-02-12T16:15:40Z TongaSimplifyPredicates finished after 0.242 seconds
2024-02-12T16:15:40Z Running ExpandISAMacro
2024-02-12T16:15:40Z ExpandISAMacro finished after 0.031 seconds
2024-02-12T16:32:44Z Compiler status PASSaTensor
[Compilation Time] 1061.82 seconds.
[Total compilation Time] 1573.49 seconds.
Traceback (most recent call last):
  File "/app/test.py", line 12, in <module>
    pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
  File "/usr/local/lib/python3.10/site-packages/optimum/modeling_base.py", line 372, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 539, in _from_transformers
    return cls._from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 459, in _from_pretrained
    dynamic_batch_size=neuron_configs[DIFFUSION_MODEL_UNET_NAME].dynamic_batch_size,
KeyError: 'unet'

Tried re-installing optimum-neuron like mentioned in another issue. Same error with the 'unet' persists. Here are the full deps:

absl-py==2.0.0
accelerate==0.23.0
aiohttp==3.9.3
aiosignal==1.3.1
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.2.0
aws-neuronx-runtime-discovery==2.9
awscli==1.29.57
boto3==1.28.57
botocore==1.31.57
cachetools==5.3.1
certifi==2023.7.22
charset-normalizer==3.2.0
cloud-tpu-client==0.10
colorama==0.4.4
coloredlogs==15.0.1
Cython==0.29.36
datasets==2.17.0
diffusers==0.25.1
dill==0.3.8
docutils==0.16
ec2-metadata==2.10.0
falcon==2.0.0
filelock==3.13.1
flatbuffers==23.5.26
frozenlist==1.4.1
fsspec==2023.10.0
gast==0.4.0
gevent==21.12.0
google-api-core==1.34.0
google-api-python-client==1.8.0
google-auth==2.23.2
google-auth-httplib2==0.2.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
googleapis-common-protos==1.62.0
greenlet==1.1.3.post0
grpcio==1.56.2
gunicorn==20.1.0
h5py==3.9.0
httplib2==0.22.0
huggingface-hub==0.20.3
humanfriendly==10.0
idna==3.4
importlib-metadata==7.0.1
islpy==2023.1
jmespath==1.0.1
keras==2.10.0
Keras-Preprocessing==1.1.2
libclang==16.0.6
libneuronxla==0.5.538
lockfile==0.12.2
Markdown==3.4.4
MarkupSafe==2.1.3
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
networkx==2.6.3
neuronx-cc==2.11.0.34+c5231f848
neuronx-distributed==0.5.0
neuronx-hwm==2.11.0.2+e34678757
numpy==1.22.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauth2client==4.1.3
oauthlib==3.2.2
opt-einsum==3.3.0
optimum==1.16.2
optimum-neuron==0.0.18
packaging==23.1
pandas==2.2.0
pgzip==0.3.5
pillow==10.2.0
protobuf==3.19.6
psutil==5.9.5
pyarrow==15.0.0
pyarrow-hotfix==0.6
pyasn1==0.5.0
pyasn1-modules==0.3.0
pyparsing==3.1.1
python-daemon==3.0.1
python-dateutil==2.8.2
pytz==2024.1
PyYAML==6.0.1
regex==2023.12.25
requests==2.31.0
requests-oauthlib==1.3.1
requests-unixsocket==0.3.0
rsa==4.7.2
s3transfer==0.7.0
safetensors==0.4.2
scipy==1.7.3
sentencepiece==0.1.99
six==1.16.0
sympy==1.12
tensorboard==2.10.1
tensorboard-data-server==0.6.1
tensorboard-plugin-neuron==2.4.6.0
tensorboard-plugin-neuronx==2.5.39.0
tensorboard-plugin-wit==1.8.1
tensorflow==2.10.1
tensorflow-estimator==2.10.0
tensorflow-io-gcs-filesystem==0.34.0
tensorflow-neuron==2.10.1.2.10.1.0
tensorflow-neuronx==2.10.1.2.1.0
tensorflow-serving-api==2.10.1
termcolor==2.3.0
tokenizers==0.15.1
torch==1.13.1
torch-neuronx==1.13.1.1.12.0
torch-xla==1.13.1+torchneuronc
torchvision==0.14.1
tqdm==4.66.2
transformers==4.36.2
transformers-neuronx==0.8.268
typing_extensions==4.8.0
tzdata==2023.4
uritemplate==3.0.1
urllib3==1.26.16
Werkzeug==2.3.7
wrapt==1.15.0
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0
zope.event==5.0
zope.interface==6.0

Hi @mikob, thanks for trying.

My bad, actually the error you met is not truly associated with the fix that I mentioned but with the environment setup which led to the failure of UNet's compilation.

The error when using the DLC

From your reproduction step, you are using the tensorflow DLC for neuronx 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference-neuronx:2.10.1-neuronx-py310-sdk2.14.1-ubuntu20.04. This is not a recommend DLC for using optimum-neuron.

If possible, I would suggest you to use either:

[STRONGLY RECOMMENDED] Hugging Face - PyTorch Neuronx DLC (which has already a compatible version of optimum-neuron on board, no need to worry about incompatibility of packages versions since they are tested before the release)
A PyTorch DLC (eg. 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py310-sdk2.15.0-ubuntu20.04)

If you choose to go with the PyTorch DLC, or your previous TensorFlow DLC with (manually installed?) PyTorch, you need to be very careful with the versioning of packages, for the neuron SDK 2.15.0, the matching optimum-neuron version should be 0.0.13 (for other package version, you could check my PR for creating the HF DLC).

[Ad]
The HuggingFace DLC for Neuron SDK 2.16 is merged and will be released soon, this will unblock more features we recently added to optimum-neuron!

The error when using the DLAMI

According to the log, it seems that the compilation is ok, but failed when loading the unet to Neuron device. The warning Warning: Model was compiled with a newer version of torch-neuron than the current runtime is very confusing to me, since torch-neuron is a dependency for inf1 if you are using inf2 (torch-neuronx), the it should not pop up those kind of warning, did you see this before @philschmid?

Will try to reproduce with the AMI and keep you posted.

And one small tip when you are configuring your environment, you could try with a tiny checkpoint of sd model (eg. you could try with hf-internal-testing/tiny-stable-diffusion-torch to test the functionality) first before going with regular checkpoints. The tiny checkpoint takes only several minutes for the compilation, this could be less painful for debugging.

@JingyaHuang thank you for checking on this.

[STRONGLY RECOMMENDED] Hugging Face - PyTorch Neuronx DLC

I think you put the wrong link, because that's the DLC that I used (as shown above in the Dockerfile). Did you mean this DLC? 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.34.1-neuronx-py310-sdk2.15.0-ubuntu20.04

which is found here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-neuron-inference-containers.

Thanks especially for the tip on the tiny model, that will save lots of time!

Fixed the link, thanks, yeah it's the HF Pytorch DLC: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.34.1-neuronx-py310-sdk2.15.0-ubuntu20.04.

And when you test with the tiny model, the max height and width is reduced to 64, so something like:

https://github.com/huggingface/optimum-neuron/blob/00ac6ce26e187f9753793ea0a60493781482ffbb/tests/cli/test_export_cli.py#L144-L173

@JingyaHuang sadly still hitting the same problem with the DLC you mentioned and the sd v1.5 inpainting model.

I tried with the test model first, and that got past the compilation step just fine.

Error:

2024-02-13T00:38:21Z Compiler status PASS
[Compilation Time] 1029.84 seconds.
[Total compilation Time] 1524.62 seconds.
Traceback (most recent call last):
  File "/app/test.py", line 13, in <module>
    pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
  File "/opt/conda/lib/python3.10/site-packages/optimum/modeling_base.py", line 372, in from_pretrained
    return from_pretrained_method(
  File "/opt/conda/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 539, in _from_transformers
    return cls._from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 459, in _from_pretrained
    dynamic_batch_size=neuron_configs[DIFFUSION_MODEL_UNET_NAME].dynamic_batch_size,
KeyError: 'unet'

FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.34.1-neuronx-py310-sdk2.15.0-ubuntu20.04

WORKDIR /app
ADD ./test.py ./

CMD ["python", "test.py"]

import requests
from PIL import Image
from io import BytesIO
from optimum.neuron import NeuronStableDiffusionInpaintPipeline

# compile & save
size = 1024
model_id = "runwayml/stable-diffusion-inpainting"
input_shapes = {"batch_size": 1, "height": size, "width": size}
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
# pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained('/sd_neuron')
pipeline.save_pretrained("sd_neuron/")

def download_image(url):
    response = requests.get(url)
    return Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = download_image(img_url).resize((size, size))
mask_image = download_image(mask_url).resize((size, size))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
image.save("cat_on_bench.png")

@mikob what is the instance type that you use? Was it an inf2.xlarge or inf2.8xlarge? (inf2.xlarge could run OOM during the compilation of unet on CPU, but it's the optimal choice during the inference.)

@JingyaHuang inf2.8xlarge

Hi @mikob, I just tested the HuggingFace Neuron AMI with the following steps:

Compilation

optimum-cli export neuron --model runwayml/stable-diffusion-inpainting --task stable-diffusion --batch_size 1 --height 1024 --width 1024 --num_images_per_prompt 1 --auto_cast matmul --auto_cast_type bf16 sd_neuron/

(Compiled artifacts could be found here)

Inference

import requests
from PIL import Image
from io import BytesIO

from optimum.neuron import NeuronStableDiffusionInpaintPipeline

pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained("sd_neuron/")

def download_image(url):
    response = requests.get(url)
    return Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

size = 1024
init_image = download_image(img_url).resize((size, size))
mask_image = download_image(mask_url).resize((size, size))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
image.save("cat_on_bench.png")

And both the compilation and the inference work as expected. Will try export with the NeuronStableDiffusionInpaintPipeline class and the DLC.

Also tested with the AMI + the snippet you used, it works fine as well without any extra env setup.

Env info below:

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/unknown,now 2.19.7.0-530fb3064 amd64 [installed,upgradable to: 2.20.11.0-c101c322e]
aws-neuronx-dkms/unknown,now 2.15.9.0 amd64 [installed]
aws-neuronx-oci-hook/unknown,now 2.2.45.0 amd64 [installed]
aws-neuronx-runtime-lib/unknown,now 2.19.5.0-97e2d271b amd64 [installed,upgradable to: 2.20.11.0-b7d33e68b]
aws-neuronx-tools/unknown,now 2.16.1.0 amd64 [installed,upgradable to: 2.17.0.0]

aws-neuronx-runtime-discovery 2.9                 
diffusers                     0.25.0              
libneuronxla                  0.5.669             
neuronx-cc                    2.12.68.0+4480452af 
neuronx-distributed           0.6.0               
neuronx-hwm                   2.12.0.0+422c9037c  
optimum-neuron                0.0.17              
tensorboard-plugin-neuronx    2.6.1.0             
torch                         1.13.1              
torch-neuronx                 1.13.1.1.13.0       
torch-xla                     1.13.1+torchneurond 
torchvision                   0.14.1              
transformers                  4.36.2              
transformers-neuronx          0.9.474

@JingyaHuang thank you for that, finally was able to get it to work with that combination! Performance was pretty disappointing, esp. with loading into the neuron core. I suppose the disk speed of the EBS is the bottleneck. I will leave it here for posterity for others going down this path:

Loading only U-Net into both Neuron Cores... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:33<00:00, 1.47it/s] ubuntu@ip-172-30-0-123:~$ python test.py loading pipeline... Loading only U-Net into both Neuron Cores... done loading pipeline took 52.323758602142334s downloading images... done downloading images took 0.24345898628234863s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:25<00:00, 1.92it/s] done generating image took 27.799940824508667s ubuntu@ip-172-30-0-123:~$ python test.py loading pipeline... Loading only U-Net into both Neuron Cores... done loading pipeline took 44.28734588623047s downloading images... done downloading images took 0.6663944721221924s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:20<00:00, 1.72it/s] done generating image took 22.7090163230896s ubuntu@ip-172-30-0-123:~$ python test.pyls python test.pyubuntu@ip-172-30-0-123:~$ python test.py^C ubuntu@ip-172-30-0-123:~$ python test.py loading pipeline... Loading only U-Net into both Neuron Cores... done loading pipeline took 49.001060009002686s downloading images... done downloading images took 0.3314809799194336s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:37<00:00, 1.08s/it] done generating image took 39.03513050079346s ubuntu@ip-172-30-0-123:~$ python test.py loading pipeline... Loading only U-Net into both Neuron Cores... done loading pipeline took 34.034974098205566s downloading images... done downloading images took 0.42200231552124023s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:37<00:00, 1.06s/it] done generating image took 38.421977281570435s

Hi @mikob, did you warm-up before the tested inference? Given that the first run will take longer. Tested on my end, with 50 inference steps, it takes around 18s per image:

results

(aws_neuron_venv_2.16.1) ubuntu@ip-172-31-33-90:~/optimum-neuron$ python test_stable_diffusion.py 
Fetching 15 files: 100%|██████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 241051.95it/s]
Loading only U-Net into both Neuron Cores...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:21<00:00,  2.34it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00,  2.91it/s]
[Inference Time] 18.48 seconds.
Generated 1 images.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00,  2.91it/s]
[Inference Time] 18.47 seconds.
Generated 1 images.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00,  2.91it/s]
[Inference Time] 18.47 seconds.
Generated 1 images.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00,  2.91it/s]
[Inference Time] 18.48 seconds.
Generated 1 images.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00,  2.91it/s]
[Inference Time] 18.48 seconds.
Generated 1 images.

code
```
import time
import numpy as np
```

from optimum.neuron import NeuronStableDiffusionInpaintPipeline

pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained("Jingya/stable-diffusion-inpainting-neuronx")

def download_image(url): response = requests.get(url) return Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

size = 1024 init_image = download_image(img_url).resize((size, size)) mask_image = download_image(mask_url).resize((size, size))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench" image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0] image.save("cat_on_bench.png")

for i in range(5): start_time = time.time() images = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images inf_time = time.time() - start_time print(f"[Inference Time] {np.round(inf_time, 2)} seconds.") print(f"Generated {len(images)} images.")



It's not that fast but a bit less than your experiment. Besides, for faster inference you could probably also consider models like lcm, sdxl-turbo with requires less inference steps.

huggingface / optimum-neuron

sd 1.5 inpainting model inference #476

System Info