Dev nodes nexfort booster

Nexfort
How to use Nexfort
- Case 1
- Case 2
Vae
- ComfyUI Workflow
- Result
Lora
- ComfyUI Workflow
- Result
Controlnet
- ComfyUI Workflow
- Result
IPAdapter

Nexfort

[x] Vae Speedup
[x] Quick Switching Lora
[x] Controlnet Speedup
[x] 支持编译 IPA https://github.com/cubiq/ComfyUI_IPAdapter_plus
[x] 支持编译 PuLID_ComfyUI https://github.com/cubiq/PuLID_ComfyUI
[x] 支持编译 https://github.com/cubiq/ComfyUI_InstantID
[ ] 支持编译 https://github.com/city96/ComfyUI_ExtraModels
[x] Quick Switching checkpoint

cd ComfyUI

# For CUDA Graph
export NEXFORT_FX_CUDAGRAPHS=1

# For best performance
export TORCHINDUCTOR_MAX_AUTOTUNE=1
# Enable CUDNN benchmark
export NEXFORT_FX_CONV_BENCHMARK=1
# Faster float32 matmul
export NEXFORT_FX_MATMUL_ALLOW_TF32=1

# For graph cache to speedup compilation
export TORCHINDUCTOR_FX_GRAPH_CACHE=1

# For persistent cache dir
export TORCHINDUCTOR_CACHE_DIR=~/.torchinductor

# debug
# export  TORCH_LOGS="+dynamo" 
# export  TORCHDYNAMO_VERBOSE=1
# export NEXFORT_DEBUG=1 NEXFORT_FX_DUMP_GRAPH=1 TORCH_COMPILE_DEBUG=1

python main.py --gpu-only --disable-cuda-malloc --port 8188 --cuda-device 6

Install: https://github.com/siliconflow/onediff/tree/main/src/onediff/infer_compiler/backends/nexfort#install-nexfort
torch.version='2.4.0.dev20240507+cu124'
nexfort.version='0.1.dev215+torch240dev20240507cu121'
commit ffc4b7c30e35eb2773ace52a0b00e0ca5c1f4362 (HEAD -> master, origin/master, origin/HEAD) Author: comfyanonymous comfyanonymous@protonmail.com Date: Sat May 25 02:31:23 2024 -0400

How to use Nexfort

Case 1

# Compile arbitrary models (torch.nn.Module)
import torch
import onediff.infer_compiler as infer_compiler

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = torch.nn.Linear(100, 10)

    def forward(self, x):
        return torch.nn.functional.relu(self.lin(x))

mod = MyModule().to("cuda").half()
with torch.inference_mode():
    compiled_mod = infer_compiler.compile(mod,
        backend="nexfort",
        options={"mode": "max-autotune:cudagraphs", "dynamic": True, "fullgraph": True},
    )
    print(compiled_mod(torch.randn(10, 100, device="cuda").half()))

Case 2

import torch
import onediff.infer_compiler as infer_compiler
@infer_compiler.compile(
    backend="nexfort",
    options={"mode": "max-autotune:cudagraphs", "dynamic": True, "fullgraph": True},
)
def foo(x):
    return torch.sin(x) + torch.cos(x)

print(foo(torch.randn(10, 10, device="cuda").half()))

Vae

ComfyUI Workflow

speedup_vae

Result

{ model: sdxl, batch_size: 1 , image: 1024x1024 , speedup: vae}	Accelerator	Baseline (non-optimized)	OneDiff (Nexfort)	Percentage improvement
NVIDIA GeForce RTX 4090	3.02 s	2.95 s	2.31%

First compilation time： 321.92 seconds

Lora

ComfyUI Workflow

speedup_vae_unet

Result

{ model: sdxl, batch_size: 1 , image: 1024x1024 , speedup: vae + unet}	Accelerator	Baseline (non-optimized)	OneDiff (Nexfort)	Percentage improvement
NVIDIA GeForce RTX 4090	3.02 s	1.85 s	38.07 %

First compilation time： 878.19 seconds

Controlnet

ComfyUI Workflow

cnet_speedup

Result

{ model: sdxl, batch_size: 1 , image: 1024x1024 , speedup: controlnet}	Accelerator	Baseline (non-optimized)	OneDiff (Nexfort)	Percentage improvement
NVIDIA GeForce RTX 4090	4.93 s	4.07 s	17.44 %

First compilation time： 437.84 seconds

siliconflow / onediff

Dev nodes nexfort booster #911

Nexfort

How to use Nexfort

Case 1

Case 2

Vae

ComfyUI Workflow

Result

Lora

ComfyUI Workflow

Result

Controlnet

ComfyUI Workflow

Result

IPAdapter