FastSD CPU is a faster version of Stable Diffusion on CPU. Based on Latent Consistency Models and Adversarial Diffusion Distillation.
The following interfaces are available :
🚀 Using OpenVINO(SDXS-512-0.9), it took 0.82 seconds (820 milliseconds) to create a single 512x512 image on a Core i7-12700.
FastSD CPU works on the following platforms:
Minimum system RAM requirement for FastSD CPU.
Model (LCM,OpenVINO): SD Turbo, 1 step, 512 x 512
Model (LCM-LoRA): Dreamshaper v8, 3 step, 512 x 512
Mode | Min RAM |
---|---|
LCM | 2 GB |
LCM-LoRA | 4 GB |
OpenVINO | 11 GB |
If we enable Tiny decoder(TAESD) we can save some memory(2GB approx) for example in OpenVINO mode memory usage will become 9GB.
:exclamation: Please note that guidance scale >1 increases RAM usage and slow inference speed.
Works with LCM-LoRA mode.
Fast 1 step inference supported on runwayml/stable-diffusion-v1-5
model,select rupeshs/hypersd-sd1-5-1-step-lora
lcm_lora model from the settings.
Works with LCM and LCM-OpenVINO mode.
Hyper-SD SDXL 1 step - rupeshs/hyper-sd-sdxl-1-step
Hyper-SD SDXL 1 step OpenVINO - rupeshs/hyper-sd-sdxl-1-step-openvino-int8
Tested on Core i7-12700 to generate 768x768 image(1 step).
Diffusion Pipeline | Latency |
---|---|
Pytorch | 19s |
OpenVINO | 13s |
OpenVINO + TAESDXL | 6.3s |
:exclamation:This is an experimental model, only text to image workflow is supported.
Tested on Core i7-12700 to generate 512x512 image(1 step).
SDXS-512-0.9
Diffusion Pipeline | Latency |
---|---|
Pytorch | 4.8s |
OpenVINO | 3.8s |
OpenVINO + TAESD | 0.82s |
Added support for ultra fast 1 step inference using sdxl-turbo model
:exclamation: These SD turbo models are intended for research purpose only.
Tested on Core i7-12700 to generate 512x512 image(1 step).
SD Turbo
Diffusion Pipeline | Latency |
---|---|
Pytorch | 7.8s |
OpenVINO | 5s |
OpenVINO + TAESD | 1.7s |
SDXL Turbo
Diffusion Pipeline | Latency |
---|---|
Pytorch | 10s |
OpenVINO | 5.6s |
OpenVINO + TAESDXL | 2.5s |
SDXL-Lightning works with LCM and LCM-OpenVINO mode.You can select these models from app settings.
Tested on Core i7-12700 to generate 768x768 image(2 steps).
Diffusion Pipeline | Latency |
---|---|
Pytorch | 18s |
OpenVINO | 12s |
OpenVINO + TAESDXL | 10s |
SDXL-Lightning - rupeshs/SDXL-Lightning-2steps
SDXL-Lightning OpenVINO - rupeshs/SDXL-Lightning-2steps-openvino-int8
FastSD CPU supports 2 to 3 steps fast inference using LCM-LoRA workflow. It works well with SD 1.5 models.
To benchmark run the following batch file on Windows:
benchmark.bat
- To benchmark Pytorchbenchmark-openvino.bat
- To benchmark OpenVINOAlternatively you can run benchmarks by passing -b
command line argument in CLI mode.
Fast SD CPU utilizes OpenVINO to speed up the inference speed. Thanks deinferno for the OpenVINO model contribution. We can get 2x speed improvement when using OpenVINO. Thanks Disty0 for the conversion script.
These are models converted to use directly use it with FastSD CPU. These models are compressed to int8 to reduce the file size (10GB to 4.4 GB) using NNCF
We have converted SD/SDXL Turbo models to OpenVINO for fast inference on CPU. These models are intended for research purpose only. Also we converted TAESDXL MODEL to OpenVINO and
You can directly use these models in FastSD CPU.
We first creates LCM-LoRA baked in model,replaces the scheduler with LCM and then converts it into OpenVINO model. For more details check LCM OpenVINO Converter, you can use this tools to convert any StableDiffusion 1.5 fine tuned models to OpenVINO.
We can generate real-time text to images using FastSD CPU.
CPU (OpenVINO)
Near real-time inference on CPU using OpenVINO, run the start-realtime.bat
batch file and open the link in browser (Resolution : 512x512,Latency : 0.82s on Intel Core i7)
Watch YouTube video :
To use single file Safetensors SD 1.5 models(Civit AI) follow this YouTube tutorial. Use LCM-LoRA Mode for single file safetensors.
Fast SD supports LCM models and LCM-LoRA models.
These models can be configured in configs/lcm-models.txt
file.
These are LCM-LoRA baked in models. These models can be configured in configs/openvino-lcm-models.txt
file
These models can be configured in configs/lcm-lora-models.txt
file.
These models are used with Stablediffusion base models configs/stable-diffusion-models.txt
.
:exclamation: Currently no support for OpenVINO LCM-LoRA models.
To add new model follow the steps:
For example we will add wavymulder/collage-diffusion
, you can give Stable diffusion 1.5 Or SDXL,SSD-1B fine tuned models.
configs/stable-diffusion-models.txt
file in text editor.wavymulder/collage-diffusion
or locally cloned path.Updated file as shown below :
Fictiverse/Stable_Diffusion_PaperCut_Model
stabilityai/stable-diffusion-xl-base-1.0
runwayml/stable-diffusion-v1-5
segmind/SSD-1B
stablediffusionapi/anything-v5
wavymulder/collage-diffusion
Similarly we can update configs/lcm-lora-models.txt
file with lcm-lora ID.
Please follow the steps to run LCM-LoRA models offline :
latent-consistency/lcm-lora-sdv1-5
Run the following commands:git lfs install
git clone https://huggingface.co/latent-consistency/lcm-lora-sdv1-5
Copy the cloned model folder path for example "D:\demo\lcm-lora-sdv1-5" and update the configs/lcm-lora-models.txt
file as shown below :
D:\demo\lcm-lora-sdv1-5
latent-consistency/lcm-lora-sdxl
latent-consistency/lcm-lora-ssd-1b
Place your lora models in "lora_models" folder. Use LCM or LCM-Lora mode. You can download lora model (.safetensors/Safetensor) from Civitai or Hugging Face E.g: cutecartoonredmond
We can use ControlNet in LCM-LoRA mode.
Download ControlNet models from ControlNet-v1-1.Download and place controlnet models in "controlnet_models" folder.
Use the medium size models (723 MB)(For example : https://huggingface.co/comfyanonymous/ControlNet-v1-1_fp16_safetensors/blob/main/control_v11p_sd15_canny_fp16.safetensors)
:exclamation:You must have a working Python installation.(Recommended : Python 3.10 or 3.11 )
To install FastSD CPU on Windows run the following steps :
install.bat
(It will take some time to install,depending on your internet speed.)start.bat
start-webui.bat
:exclamation:Ensure that you have Python 3.9 or 3.10 or 3.11 version installed.
Run the following command
chmod +x install.sh
./install.sh
./start.sh
./start-webui.sh
:exclamation:Ensure that you have Python 3.9 or 3.10 or 3.11 version installed.
Run the following commands to install FastSD CPU on Mac :
Run the following command
chmod +x install-mac.sh
./install-mac.sh
./start.sh
./start-webui.sh
Thanks Autantpourmoi for Mac testing.
:exclamation:We don't support OpenVINO on Mac (M1/M2/M3 chips, but does work on Intel chips).
If you want to increase image generation speed on Mac(M1/M2 chip) try this:
export DEVICE=mps
and start app start.sh
Due to the limitation of using CPU/OpenVINO inside colab, we are using GPU with colab.
Open the terminal and enter into fastsdcpu folder. Activate virtual environment using the command:
(Suppose FastSD CPU available in the directory "D:\fastsdcpu")
D:\fastsdcpu\env\Scripts\activate.bat
source env/bin/activate
Start CLI src/app.py -h
FastSD CPU running on Google Pixel 7 Pro.
First you have to install Termux and install PRoot. Then install and login to Ubuntu in PRoot.
Run the following command to install without Qt GUI.
proot-distro login ubuntu
./install.sh --disable-gui
After the installation you can use WebUi.
./start-webui.sh
Note : If you get libgl.so.1
import error run apt-get install ffmpeg
.
Thanks patienx for this guide Step by step guide to installing FASTSDCPU on ANDROID
Another step by step guide to run FastSD on Android is here
Thanks [WGNW_MGM] for Raspberry PI 4 testing.FastSD CPU worked without problems. System configuration - Raspberry Pi 4 with 4GB RAM, 8GB of SWAP memory.
Thanks khanumballz for testing FastSD CPU with Orange PI 5. Here is a video of FastSD CPU running on Orange Pi 5.
FastSD CPU supports basic API endpoints. Following API endpoints are available :
To start FastAPI in webserver mode run:
python src/app.py --api
or use start-webserver.sh
for Linux and start-webserver.bat
for Windows.
Access API documentation locally at http://localhost:8000/api/docs .
Generated image is JPEG image encoded as base64 string. In the image-to-image mode input image should be encoded as base64 string.
To generate an image a minimal request POST /api/generate
with body :
{
"prompt": "a cute cat",
"use_openvino": true
}
The fastsdcpu project is available as open source under the terms of the MIT license
Users are granted the freedom to create images using this tool, but they are obligated to comply with local laws and utilize it responsibly. The developers will not assume any responsibility for potential misuse by users.