rupeshs / fastsdcpu

Fast stable diffusion on CPU
MIT License
1.49k stars 120 forks source link
api cli cpu desktopgui diffusers diffusion edsr fastsdcpu flux gradio latentconsistencymodels lcmdiffusion openvino qt sdupcale sdxlturbo sdxs stablediffusion torch webui

FastSD CPU :sparkles:Mentioned in Awesome OpenVINO

rupeshs%2Ffastsdcpu | Trendshift

FastSD CPU is a faster version of Stable Diffusion on CPU. Based on Latent Consistency Models and Adversarial Diffusion Distillation.

FastSD CPU screenshot The following interfaces are available :

🚀 Using OpenVINO(SDXS-512-0.9), it took 0.82 seconds (820 milliseconds) to create a single 512x512 image on a Core i7-12700.

Table of Contents

Supported platforms⚡️

FastSD CPU works on the following platforms:

Dependencies

Memory requirements

Minimum system RAM requirement for FastSD CPU.

Model (LCM,OpenVINO): SD Turbo, 1 step, 512 x 512

Model (LCM-LoRA): Dreamshaper v8, 3 step, 512 x 512

Mode Min RAM
LCM 2 GB
LCM-LoRA 4 GB
OpenVINO 11 GB

If we enable Tiny decoder(TAESD) we can save some memory(2GB approx) for example in OpenVINO mode memory usage will become 9GB.

:exclamation: Please note that guidance scale >1 increases RAM usage and slow inference speed.

Features

Fast Inference Benchmarks

🚀 Fast 1 step inference with Hyper-SD

Stable diffuion 1.5

Works with LCM-LoRA mode. Fast 1 step inference supported on runwayml/stable-diffusion-v1-5 model,select rupeshs/hypersd-sd1-5-1-step-lora lcm_lora model from the settings.

Stable diffuion XL

Works with LCM and LCM-OpenVINO mode.

Inference Speed

Tested on Core i7-12700 to generate 768x768 image(1 step).

Diffusion Pipeline Latency
Pytorch 19s
OpenVINO 13s
OpenVINO + TAESDXL 6.3s

Fastest 1 step inference (SDXS-512-0.9)

:exclamation:This is an experimental model, only text to image workflow is supported.

Inference Speed

Tested on Core i7-12700 to generate 512x512 image(1 step).

SDXS-512-0.9

Diffusion Pipeline Latency
Pytorch 4.8s
OpenVINO 3.8s
OpenVINO + TAESD 0.82s

🚀 Fast 1 step inference (SD/SDXL Turbo - Adversarial Diffusion Distillation,ADD)

Added support for ultra fast 1 step inference using sdxl-turbo model

:exclamation: These SD turbo models are intended for research purpose only.

Inference Speed

Tested on Core i7-12700 to generate 512x512 image(1 step).

SD Turbo

Diffusion Pipeline Latency
Pytorch 7.8s
OpenVINO 5s
OpenVINO + TAESD 1.7s

SDXL Turbo

Diffusion Pipeline Latency
Pytorch 10s
OpenVINO 5.6s
OpenVINO + TAESDXL 2.5s

🚀 Fast 2 step inference (SDXL-Lightning - Adversarial Diffusion Distillation)

SDXL-Lightning works with LCM and LCM-OpenVINO mode.You can select these models from app settings.

Tested on Core i7-12700 to generate 768x768 image(2 steps).

Diffusion Pipeline Latency
Pytorch 18s
OpenVINO 12s
OpenVINO + TAESDXL 10s

2 Steps fast inference (LCM)

FastSD CPU supports 2 to 3 steps fast inference using LCM-LoRA workflow. It works well with SD 1.5 models.

2 Steps inference

FLUX.1-schnell OpenVINO support

FLUX Schenell OpenVINO

:exclamation: Important - Please note the following points with FLUX workflow

Tested on Intel Core i7-12700 to generate 512x512 image(3 steps).

Diffusion Pipeline Latency
OpenVINO 4 min 30sec

Benchmark scripts

To benchmark run the following batch file on Windows:

Alternatively you can run benchmarks by passing -b command line argument in CLI mode.

OpenVINO support

Fast SD CPU utilizes OpenVINO to speed up the inference speed. Thanks deinferno for the OpenVINO model contribution. We can get 2x speed improvement when using OpenVINO. Thanks Disty0 for the conversion script.

OpenVINO SDXL models

These are models converted to use directly use it with FastSD CPU. These models are compressed to int8 to reduce the file size (10GB to 4.4 GB) using NNCF

OpenVINO SD Turbo models

We have converted SD/SDXL Turbo models to OpenVINO for fast inference on CPU. These models are intended for research purpose only. Also we converted TAESDXL MODEL to OpenVINO and

You can directly use these models in FastSD CPU.

Convert SD 1.5 models to OpenVINO LCM-LoRA fused models

We first creates LCM-LoRA baked in model,replaces the scheduler with LCM and then converts it into OpenVINO model. For more details check LCM OpenVINO Converter, you can use this tools to convert any StableDiffusion 1.5 fine tuned models to OpenVINO.

Intel AI PC support - OpenVINO (CPU, GPU, NPU)

Fast SD now supports AI PC with Intel® Core™ Ultra Processors. To learn more about AI PC and OpenVINO.

GPU

For GPU mode set device=GPU and run webui. FastSD GPU benchmark on AI PC as shown below.

FastSD AI PC Arc GPU benchmark

NPU

FastSD CPU now supports power efficient NPU (Neural Processing Unit) that comes with Intel Core Ultra processors.

FastSD tested with following Intel processor's NPUs:

Currently FastSD support this model for NPU rupeshs/sd15-lcm-square-openvino-int8.

Supports following modes on NPU :

To run model in NPU follow these steps (Please make sure that your AI PC's NPU driver is the latest):

This is heterogeneous computing since text encoder and Unet will use NPU and VAE will use GPU for processing. Thanks to OpenVINO.

Please note that tiny auto encoder will not work in NPU mode.

Thanks to Intel for providing AI PC dev kit and Tiber cloud access to test FastSD, special thanks to Pooja Baraskar,Dmitriy Pastushenkov.

GGUF support - Flux

GGUF Flux model supported via stablediffusion.cpp shared library. Currently Flux Schenell model supported.

To use GGUF model use web UI and select GGUF mode.

Tested on Windows and Linux.

:exclamation: Main advantage here we reduced minimum system RAM required for Flux workflow to around 12 GB.

Supported mode - Text to image

How to run Flux GGUF model

Build stablediffusion.cpp shared library for GGUF flux model support(Optional)

To build the stablediffusion.cpp library follow these steps

Real-time text to image (EXPERIMENTAL)

We can generate real-time text to images using FastSD CPU.

CPU (OpenVINO)

Near real-time inference on CPU using OpenVINO, run the start-realtime.bat batch file and open the link in browser (Resolution : 512x512,Latency : 0.82s on Intel Core i7)

Watch YouTube video :

IMAGE_ALT

Models

To use single file Safetensors SD 1.5 models(Civit AI) follow this YouTube tutorial. Use LCM-LoRA Mode for single file safetensors.

Fast SD supports LCM models and LCM-LoRA models.

LCM Models

These models can be configured in configs/lcm-models.txt file.

OpenVINO models

These are LCM-LoRA baked in models. These models can be configured in configs/openvino-lcm-models.txt file

LCM-LoRA models

These models can be configured in configs/lcm-lora-models.txt file.

These models are used with Stablediffusion base models configs/stable-diffusion-models.txt.

:exclamation: Currently no support for OpenVINO LCM-LoRA models.

How to add new LCM-LoRA models

To add new model follow the steps: For example we will add wavymulder/collage-diffusion, you can give Stable diffusion 1.5 Or SDXL,SSD-1B fine tuned models.

  1. Open configs/stable-diffusion-models.txt file in text editor.
  2. Add the model ID wavymulder/collage-diffusion or locally cloned path.

Updated file as shown below :

Fictiverse/Stable_Diffusion_PaperCut_Model
stabilityai/stable-diffusion-xl-base-1.0
runwayml/stable-diffusion-v1-5
segmind/SSD-1B
stablediffusionapi/anything-v5
wavymulder/collage-diffusion

Similarly we can update configs/lcm-lora-models.txt file with lcm-lora ID.

How to use LCM-LoRA models offline

Please follow the steps to run LCM-LoRA models offline :

git lfs install
git clone https://huggingface.co/latent-consistency/lcm-lora-sdv1-5

Copy the cloned model folder path for example "D:\demo\lcm-lora-sdv1-5" and update the configs/lcm-lora-models.txt file as shown below :

D:\demo\lcm-lora-sdv1-5
latent-consistency/lcm-lora-sdxl
latent-consistency/lcm-lora-ssd-1b

How to use Lora models

Place your lora models in "lora_models" folder. Use LCM or LCM-Lora mode. You can download lora model (.safetensors/Safetensor) from Civitai or Hugging Face E.g: cutecartoonredmond

ControlNet support

We can use ControlNet in LCM-LoRA mode.

Download ControlNet models from ControlNet-v1-1.Download and place controlnet models in "controlnet_models" folder.

Use the medium size models (723 MB)(For example : https://huggingface.co/comfyanonymous/ControlNet-v1-1_fp16_safetensors/blob/main/control_v11p_sd15_canny_fp16.safetensors)

Installation

FastSD CPU on Windows

FastSD CPU Desktop GUI Screenshot

:exclamation:You must have a working Python installation.(Recommended : Python 3.10 or 3.11 )

To install FastSD CPU on Windows run the following steps :

Desktop GUI

Web UI

FastSD CPU on Linux

:exclamation:Ensure that you have Python 3.9 or 3.10 or 3.11 version installed.

To start Desktop GUI

./start.sh

To start Web UI

./start-webui.sh

FastSD CPU on Mac

FastSD CPU running on Mac

:exclamation:Ensure that you have Python 3.9 or 3.10 or 3.11 version installed.

Run the following commands to install FastSD CPU on Mac :

To start Desktop GUI

./start.sh

To start Web UI

./start-webui.sh

Thanks Autantpourmoi for Mac testing.

:exclamation:We don't support OpenVINO on Mac (M1/M2/M3 chips, but does work on Intel chips).

If you want to increase image generation speed on Mac(M1/M2 chip) try this:

export DEVICE=mps and start app start.sh

Web UI screenshot

FastSD CPU WebUI Screenshot

Google Colab

Due to the limitation of using CPU/OpenVINO inside colab, we are using GPU with colab. Open in Colab

CLI mode (Advanced users)

FastSD CPU CLI Screenshot

Open the terminal and enter into fastsdcpu folder. Activate virtual environment using the command:

Windows users

(Suppose FastSD CPU available in the directory "D:\fastsdcpu") D:\fastsdcpu\env\Scripts\activate.bat

Linux users

source env/bin/activate

Start CLI src/app.py -h

Android (Termux + PRoot)

FastSD CPU running on Google Pixel 7 Pro.

FastSD CPU Android Termux Screenshot

1. Prerequisites

First you have to install Termux and install PRoot. Then install and login to Ubuntu in PRoot.

2. Install FastSD CPU

Run the following command to install without Qt GUI.

proot-distro login ubuntu

./install.sh --disable-gui

After the installation you can use WebUi.

./start-webui.sh

Note : If you get libgl.so.1 import error run apt-get install ffmpeg.

Thanks patienx for this guide Step by step guide to installing FASTSDCPU on ANDROID

Another step by step guide to run FastSD on Android is here

Raspberry PI 4 support

Thanks [WGNW_MGM] for Raspberry PI 4 testing.FastSD CPU worked without problems. System configuration - Raspberry Pi 4 with 4GB RAM, 8GB of SWAP memory.

Orange Pi 5 support

Thanks khanumballz for testing FastSD CPU with Orange PI 5. Here is a video of FastSD CPU running on Orange Pi 5.

API support

FastSD CPU API documentation

FastSD CPU supports basic API endpoints. Following API endpoints are available :

To start FastAPI in webserver mode run: python src/app.py --api

or use start-webserver.sh for Linux and start-webserver.bat for Windows.

Access API documentation locally at http://localhost:8000/api/docs .

Generated image is JPEG image encoded as base64 string. In the image-to-image mode input image should be encoded as base64 string.

To generate an image a minimal request POST /api/generate with body :

{
    "prompt": "a cute cat",
    "use_openvino": true
}

Known issues

License

The fastsdcpu project is available as open source under the terms of the MIT license

Disclaimer

Users are granted the freedom to create images using this tool, but they are obligated to comply with local laws and utilize it responsibly. The developers will not assume any responsibility for potential misuse by users.

Thanks to all our contributors

Original Author & Maintainer - Rupesh Sreeraman

We thank all contributors for their time and hard work!