Closed mikob closed 8 months ago
Hi @mikob, thanks for reporting the issue. There is currently an incompatibility between the latest optimum-neuron and diffusers > 0.26.0. The issue is fixed in #458, and we are planning for a patch release this week. Meanwhile you can either continue with optimum-neuron built from source or diffusers <0.26.* (it will take approximately an hour for the compilation).
Let me know if you have any further questions.
And for the second error, it seems that you are loading the compiled model with a newer version of torch-neuron, could you post the neuron SDK versions with the following command? I will try to reproduce it.
apt list --installed | grep aws-neuron
pip3 list | grep -e neuron -e tvm -e torch
@JingyaHuang thank you for the fast response! Recompiling now with diffusers 0.25.1, I'll let you know if it's successful.
And for the second error, it seems that you are loading the compiled model with a newer version of torch-neuron, could you post the neuron SDK versions with the following command? I will try to reproduce it.
I'm using the official AWS neuron docker image mentioned above from here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers, here is the output:
aws-neuronx-collectives/now 2.17.9.0-fb6d14044 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.17.7.0-df62e3f70 amd64 [installed,local]
aws-neuronx-tools/now 2.14.6.0 amd64 [installed,local]
root@e559460e2125:/app# pip3 list | grep -e neuron -e tvm -e torch
aws-neuronx-runtime-discovery 2.9
libneuronxla 0.5.538
neuronx-cc 2.11.0.34+c5231f848
neuronx-distributed 0.5.0
neuronx-hwm 2.11.0.2+e34678757
optimum-neuron 0.0.13
tensorboard-plugin-neuron 2.4.6.0
tensorboard-plugin-neuronx 2.5.39.0
tensorflow-neuron 2.10.1.2.10.1.0
tensorflow-neuronx 2.10.1.2.1.0
torch 1.13.1
torch-neuronx 1.13.1.1.12.0
torch-xla 1.13.1+torchneuronc
torchvision 0.14.1
transformers-neuronx 0.8.268
@JingyaHuang downgraded to diffusers 0.25.1 and still got this error when compiling:
2024-02-12T16:15:38Z TongaLICM finished after 0.129 seconds
2024-02-12T16:15:38Z Running InferPSumTensor
2024-02-12T16:15:39Z InferPSumTensor finished after 0.715 seconds
2024-02-12T16:15:39Z Running WeightCoalescing
2024-02-12T16:15:39Z WeightCoalescing finished after 0.038 seconds
2024-02-12T16:15:39Z Running LegalizeSundaAccess
2024-02-12T16:15:39Z LegalizeSundaAccess finished after 0.156 seconds
2024-02-12T16:15:39Z Running RelaxPredicates
2024-02-12T16:15:39Z RelaxPredicates finished after 0.037 seconds
2024-02-12T16:15:39Z Running TensorInitialization
2024-02-12T16:15:40Z TensorInitialization finished after 0.345 seconds
2024-02-12T16:15:40Z Running TongaSimplifyPredicates
2024-02-12T16:15:40Z TongaSimplifyPredicates finished after 0.242 seconds
2024-02-12T16:15:40Z Running ExpandISAMacro
2024-02-12T16:15:40Z ExpandISAMacro finished after 0.031 seconds
2024-02-12T16:32:44Z Compiler status PASSaTensor
[Compilation Time] 1061.82 seconds.
[Total compilation Time] 1573.49 seconds.
Traceback (most recent call last):
File "/app/test.py", line 12, in <module>
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
File "/usr/local/lib/python3.10/site-packages/optimum/modeling_base.py", line 372, in from_pretrained
return from_pretrained_method(
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 539, in _from_transformers
return cls._from_pretrained(
File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 459, in _from_pretrained
dynamic_batch_size=neuron_configs[DIFFUSION_MODEL_UNET_NAME].dynamic_batch_size,
KeyError: 'unet'
Tried re-installing optimum-neuron like mentioned in another issue. Same error with the 'unet' persists. Here are the full deps:
absl-py==2.0.0
accelerate==0.23.0
aiohttp==3.9.3
aiosignal==1.3.1
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.2.0
aws-neuronx-runtime-discovery==2.9
awscli==1.29.57
boto3==1.28.57
botocore==1.31.57
cachetools==5.3.1
certifi==2023.7.22
charset-normalizer==3.2.0
cloud-tpu-client==0.10
colorama==0.4.4
coloredlogs==15.0.1
Cython==0.29.36
datasets==2.17.0
diffusers==0.25.1
dill==0.3.8
docutils==0.16
ec2-metadata==2.10.0
falcon==2.0.0
filelock==3.13.1
flatbuffers==23.5.26
frozenlist==1.4.1
fsspec==2023.10.0
gast==0.4.0
gevent==21.12.0
google-api-core==1.34.0
google-api-python-client==1.8.0
google-auth==2.23.2
google-auth-httplib2==0.2.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
googleapis-common-protos==1.62.0
greenlet==1.1.3.post0
grpcio==1.56.2
gunicorn==20.1.0
h5py==3.9.0
httplib2==0.22.0
huggingface-hub==0.20.3
humanfriendly==10.0
idna==3.4
importlib-metadata==7.0.1
islpy==2023.1
jmespath==1.0.1
keras==2.10.0
Keras-Preprocessing==1.1.2
libclang==16.0.6
libneuronxla==0.5.538
lockfile==0.12.2
Markdown==3.4.4
MarkupSafe==2.1.3
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
networkx==2.6.3
neuronx-cc==2.11.0.34+c5231f848
neuronx-distributed==0.5.0
neuronx-hwm==2.11.0.2+e34678757
numpy==1.22.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauth2client==4.1.3
oauthlib==3.2.2
opt-einsum==3.3.0
optimum==1.16.2
optimum-neuron==0.0.18
packaging==23.1
pandas==2.2.0
pgzip==0.3.5
pillow==10.2.0
protobuf==3.19.6
psutil==5.9.5
pyarrow==15.0.0
pyarrow-hotfix==0.6
pyasn1==0.5.0
pyasn1-modules==0.3.0
pyparsing==3.1.1
python-daemon==3.0.1
python-dateutil==2.8.2
pytz==2024.1
PyYAML==6.0.1
regex==2023.12.25
requests==2.31.0
requests-oauthlib==1.3.1
requests-unixsocket==0.3.0
rsa==4.7.2
s3transfer==0.7.0
safetensors==0.4.2
scipy==1.7.3
sentencepiece==0.1.99
six==1.16.0
sympy==1.12
tensorboard==2.10.1
tensorboard-data-server==0.6.1
tensorboard-plugin-neuron==2.4.6.0
tensorboard-plugin-neuronx==2.5.39.0
tensorboard-plugin-wit==1.8.1
tensorflow==2.10.1
tensorflow-estimator==2.10.0
tensorflow-io-gcs-filesystem==0.34.0
tensorflow-neuron==2.10.1.2.10.1.0
tensorflow-neuronx==2.10.1.2.1.0
tensorflow-serving-api==2.10.1
termcolor==2.3.0
tokenizers==0.15.1
torch==1.13.1
torch-neuronx==1.13.1.1.12.0
torch-xla==1.13.1+torchneuronc
torchvision==0.14.1
tqdm==4.66.2
transformers==4.36.2
transformers-neuronx==0.8.268
typing_extensions==4.8.0
tzdata==2023.4
uritemplate==3.0.1
urllib3==1.26.16
Werkzeug==2.3.7
wrapt==1.15.0
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0
zope.event==5.0
zope.interface==6.0
Hi @mikob, thanks for trying.
My bad, actually the error you met is not truly associated with the fix that I mentioned but with the environment setup which led to the failure of UNet's compilation.
From your reproduction step, you are using the tensorflow DLC for neuronx 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference-neuronx:2.10.1-neuronx-py310-sdk2.14.1-ubuntu20.04
. This is not a recommend DLC for using optimum-neuron.
If possible, I would suggest you to use either:
optimum-neuron
on board, no need to worry about incompatibility of packages versions since they are tested before the release)763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py310-sdk2.15.0-ubuntu20.04
)If you choose to go with the PyTorch DLC, or your previous TensorFlow DLC with (manually installed?) PyTorch, you need to be very careful with the versioning of packages, for the neuron SDK 2.15.0, the matching optimum-neuron version should be 0.0.13
(for other package version, you could check my PR for creating the HF DLC).
[Ad]
The HuggingFace DLC for Neuron SDK 2.16 is merged and will be released soon, this will unblock more features we recently added to optimum-neuron!
According to the log, it seems that the compilation is ok, but failed when loading the unet to Neuron device. The warning Warning: Model was compiled with a newer version of torch-neuron than the current runtime
is very confusing to me, since torch-neuron
is a dependency for inf1 if you are using inf2 (torch-neuronx
), the it should not pop up those kind of warning, did you see this before @philschmid?
Will try to reproduce with the AMI and keep you posted.
And one small tip when you are configuring your environment, you could try with a tiny checkpoint of sd model (eg. you could try with hf-internal-testing/tiny-stable-diffusion-torch
to test the functionality) first before going with regular checkpoints. The tiny checkpoint takes only several minutes for the compilation, this could be less painful for debugging.
@JingyaHuang thank you for checking on this.
[STRONGLY RECOMMENDED] Hugging Face - PyTorch Neuronx DLC
I think you put the wrong link, because that's the DLC that I used (as shown above in the Dockerfile). Did you mean this DLC? 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.34.1-neuronx-py310-sdk2.15.0-ubuntu20.04
which is found here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-neuron-inference-containers.
Thanks especially for the tip on the tiny model, that will save lots of time!
Fixed the link, thanks, yeah it's the HF Pytorch DLC: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.34.1-neuronx-py310-sdk2.15.0-ubuntu20.04
.
And when you test with the tiny model, the max height and width is reduced to 64, so something like:
@JingyaHuang sadly still hitting the same problem with the DLC you mentioned and the sd v1.5 inpainting model.
I tried with the test model first, and that got past the compilation step just fine.
Error:
2024-02-13T00:38:21Z Compiler status PASS
[Compilation Time] 1029.84 seconds.
[Total compilation Time] 1524.62 seconds.
Traceback (most recent call last):
File "/app/test.py", line 13, in <module>
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
File "/opt/conda/lib/python3.10/site-packages/optimum/modeling_base.py", line 372, in from_pretrained
return from_pretrained_method(
File "/opt/conda/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 539, in _from_transformers
return cls._from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/optimum/neuron/modeling_diffusion.py", line 459, in _from_pretrained
dynamic_batch_size=neuron_configs[DIFFUSION_MODEL_UNET_NAME].dynamic_batch_size,
KeyError: 'unet'
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.34.1-neuronx-py310-sdk2.15.0-ubuntu20.04
WORKDIR /app
ADD ./test.py ./
CMD ["python", "test.py"]
import requests
from PIL import Image
from io import BytesIO
from optimum.neuron import NeuronStableDiffusionInpaintPipeline
# compile & save
size = 1024
model_id = "runwayml/stable-diffusion-inpainting"
input_shapes = {"batch_size": 1, "height": size, "width": size}
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(model_id, export=True, **input_shapes)
# pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained('/sd_neuron')
pipeline.save_pretrained("sd_neuron/")
def download_image(url):
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((size, size))
mask_image = download_image(mask_url).resize((size, size))
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
image.save("cat_on_bench.png")
@mikob what is the instance type that you use? Was it an inf2.xlarge or inf2.8xlarge? (inf2.xlarge
could run OOM during the compilation of unet on CPU, but it's the optimal choice during the inference.)
@JingyaHuang inf2.8xlarge
Hi @mikob, I just tested the HuggingFace Neuron AMI with the following steps:
Compilation
optimum-cli export neuron --model runwayml/stable-diffusion-inpainting --task stable-diffusion --batch_size 1 --height 1024 --width 1024 --num_images_per_prompt 1 --auto_cast matmul --auto_cast_type bf16 sd_neuron/
(Compiled artifacts could be found here)
Inference
import requests
from PIL import Image
from io import BytesIO
from optimum.neuron import NeuronStableDiffusionInpaintPipeline
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained("sd_neuron/")
def download_image(url):
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
size = 1024
init_image = download_image(img_url).resize((size, size))
mask_image = download_image(mask_url).resize((size, size))
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
image.save("cat_on_bench.png")
And both the compilation and the inference work as expected. Will try export with the NeuronStableDiffusionInpaintPipeline
class and the DLC.
Also tested with the AMI + the snippet you used, it works fine as well without any extra env setup.
Env info below:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
aws-neuronx-collectives/unknown,now 2.19.7.0-530fb3064 amd64 [installed,upgradable to: 2.20.11.0-c101c322e]
aws-neuronx-dkms/unknown,now 2.15.9.0 amd64 [installed]
aws-neuronx-oci-hook/unknown,now 2.2.45.0 amd64 [installed]
aws-neuronx-runtime-lib/unknown,now 2.19.5.0-97e2d271b amd64 [installed,upgradable to: 2.20.11.0-b7d33e68b]
aws-neuronx-tools/unknown,now 2.16.1.0 amd64 [installed,upgradable to: 2.17.0.0]
aws-neuronx-runtime-discovery 2.9
diffusers 0.25.0
libneuronxla 0.5.669
neuronx-cc 2.12.68.0+4480452af
neuronx-distributed 0.6.0
neuronx-hwm 2.12.0.0+422c9037c
optimum-neuron 0.0.17
tensorboard-plugin-neuronx 2.6.1.0
torch 1.13.1
torch-neuronx 1.13.1.1.13.0
torch-xla 1.13.1+torchneurond
torchvision 0.14.1
transformers 4.36.2
transformers-neuronx 0.9.474
@JingyaHuang thank you for that, finally was able to get it to work with that combination! Performance was pretty disappointing, esp. with loading into the neuron core. I suppose the disk speed of the EBS is the bottleneck. I will leave it here for posterity for others going down this path:
Loading only U-Net into both Neuron Cores... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:33<00:00, 1.47it/s] ubuntu@ip-172-30-0-123:~$ python test.py loading pipeline... Loading only U-Net into both Neuron Cores... done loading pipeline took 52.323758602142334s downloading images... done downloading images took 0.24345898628234863s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:25<00:00, 1.92it/s] done generating image took 27.799940824508667s ubuntu@ip-172-30-0-123:~$ python test.py loading pipeline... Loading only U-Net into both Neuron Cores... done loading pipeline took 44.28734588623047s downloading images... done downloading images took 0.6663944721221924s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:20<00:00, 1.72it/s] done generating image took 22.7090163230896s ubuntu@ip-172-30-0-123:~$ python test.pyls python test.pyubuntu@ip-172-30-0-123:~$ python test.py^C ubuntu@ip-172-30-0-123:~$ python test.py loading pipeline... Loading only U-Net into both Neuron Cores... done loading pipeline took 49.001060009002686s downloading images... done downloading images took 0.3314809799194336s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:37<00:00, 1.08s/it] done generating image took 39.03513050079346s ubuntu@ip-172-30-0-123:~$ python test.py loading pipeline... Loading only U-Net into both Neuron Cores... done loading pipeline took 34.034974098205566s downloading images... done downloading images took 0.42200231552124023s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:37<00:00, 1.06s/it] done generating image took 38.421977281570435s
Hi @mikob, did you warm-up before the tested inference? Given that the first run will take longer. Tested on my end, with 50 inference steps, it takes around 18s per image:
(aws_neuron_venv_2.16.1) ubuntu@ip-172-31-33-90:~/optimum-neuron$ python test_stable_diffusion.py
Fetching 15 files: 100%|██████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 241051.95it/s]
Loading only U-Net into both Neuron Cores...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:21<00:00, 2.34it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00, 2.91it/s]
[Inference Time] 18.48 seconds.
Generated 1 images.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00, 2.91it/s]
[Inference Time] 18.47 seconds.
Generated 1 images.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00, 2.91it/s]
[Inference Time] 18.47 seconds.
Generated 1 images.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00, 2.91it/s]
[Inference Time] 18.48 seconds.
Generated 1 images.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:17<00:00, 2.91it/s]
[Inference Time] 18.48 seconds.
Generated 1 images.
import time
import numpy as np
from optimum.neuron import NeuronStableDiffusionInpaintPipeline
pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained("Jingya/stable-diffusion-inpainting-neuronx")
def download_image(url): response = requests.get(url) return Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
size = 1024 init_image = download_image(img_url).resize((size, size)) mask_image = download_image(mask_url).resize((size, size))
prompt = "Face of a yellow cat, high resolution, sitting on a park bench" image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0] image.save("cat_on_bench.png")
for i in range(5): start_time = time.time() images = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images inf_time = time.time() - start_time print(f"[Inference Time] {np.round(inf_time, 2)} seconds.") print(f"Generated {len(images)} images.")
It's not that fast but a bit less than your experiment. Besides, for faster inference you could probably also consider models like lcm, sdxl-turbo with requires less inference steps.
System Info
Trying to export a SD 1.5 inpainting model for inf2 as per the instructions via the python API I get this error:
The deps are from this docker container:
Also tried via the optimum CLI using this AMI: https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2 but I got an error:
compile commands