siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.
https://github.com/siliconflow/onediff/wiki
Apache License 2.0
1.72k stars 107 forks source link

[Bug] Workflow froze when using the Onediff nodes #1100

Closed Tedbees closed 2 months ago

Tedbees commented 2 months ago

Your current environment information

Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OneFlow version: none Nexfort version: 0.1.dev268 OneDiff version: 1.2.0 OneDiffX version: none

OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.26.4 Libc version: glibc-2.35

Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-6.5.0-45-generic-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 12.1.105 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA H100 80GB HBM3 GPU 1: NVIDIA H100 80GB HBM3

Nvidia driver version: 550.90.07 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 208 On-line CPU(s) list: 0-207 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8480+ CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 52 Socket(s): 2 Stepping: 8 BogoMIPS: 4000.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd arat vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk avx512_fp16 arch_capabilities Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 6.5 MiB (208 instances) L1i cache: 6.5 MiB (208 instances) L2 cache: 416 MiB (104 instances) L3 cache: 32 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-103 NUMA node1 CPU(s): 104-207 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Unknown: No mitigations Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI Syscall hardening, KVM SW loop Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; TSX disabled

Versions of relevant libraries: [pip3] diffusers==0.27.1 [pip3] numpy==1.26.4 [pip3] onnx==1.16.2 [pip3] onnxruntime==1.17.1 [pip3] onnxruntime-gpu==1.19.0 [pip3] open_clip_torch==2.26.1 [pip3] pytorch-lightning==2.4.0 [pip3] torch==2.4.0 [pip3] torchaudio==2.4.0 [pip3] torchmetrics==1.4.1 [pip3] torchsde==0.2.6 [pip3] torchvision==0.19.0 [pip3] transformers==4.38.2 [pip3] triton==3.0.0 [conda] No relevant packages

🐛 Describe the bug

Hi, interested in your project and was attempting to test it out. Have configured everything per the guide and managed to load a simple workflow with the Onediff checkpoint loader. The workflow seemed to freeze upon queuing the prompt with no errors and was wondering what I may have done wrong?

got prompt
model_type EPS
model weight dtype torch.float16, manual cast: None
Using pytorch attention in VAE
Using pytorch attention in VAE
loaded straight to GPU
Requested to load SDXL
Loading 1 new model
Cache lookup: Key='63aaf525-b852-4608-bb19-8511fd4b55bd', Cached Model Type='<class 'NoneType'>'
Requested to load SDXLClipModel
Loading 1 new model
  0%|                                                            | 0/8 [00:00<?, ?it/s]
Tedbees commented 2 months ago

Apparently it's running, it took a long time before raising an exception

got prompt
model_type EPS
model weight dtype torch.float16, manual cast: None
Using pytorch attention in VAE
Using pytorch attention in VAE
loaded straight to GPU
Requested to load SDXL
Loading 1 new model
Cache lookup: Key='63aaf525-b852-4608-bb19-8511fd4b55bd', Cached Model Type='<class 'NoneType'>'
Requested to load SDXLClipModel
Loading 1 new model
  0%|                                                                                                                                                                       | 0/8 [38:41<?, ?it/s]
!!! Exception during processing!!! backend='nexfort' raised:
TypeError: Expected list, got str

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

Traceback (most recent call last):
  File "/app/ComfyUI/execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/app/ComfyUI/execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "/app/ComfyUI/execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/app/ComfyUI/nodes.py", line 1382, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "/app/ComfyUI/nodes.py", line 1352, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "/app/ComfyUI/comfy/sample.py", line 43, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/app/ComfyUI/comfy/samplers.py", line 829, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/app/ComfyUI/comfy/samplers.py", line 729, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/app/ComfyUI/comfy/samplers.py", line 716, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/app/ComfyUI/comfy/samplers.py", line 695, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "/app/ComfyUI/comfy/samplers.py", line 600, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/app/ComfyUI/comfy/k_diffusion/sampling.py", line 160, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/app/ComfyUI/comfy/samplers.py", line 299, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "/app/ComfyUI/comfy/samplers.py", line 682, in __call__
    return self.predict_noise(*args, **kwargs)
  File "/app/ComfyUI/comfy/samplers.py", line 685, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
  File "/app/ComfyUI/comfy/samplers.py", line 279, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "/app/ComfyUI/custom_nodes/onediff_comfy_nodes/modules/sd_hijack_utils.py", line 58, in hijacked_method
    return self(*args, **kwargs)
  File "/app/ComfyUI/custom_nodes/onediff_comfy_nodes/modules/sd_hijack_utils.py", line 95, in __call__
    return sub_func(self._orig_func, *args, **kwargs)
  File "/app/ComfyUI/custom_nodes/onediff_comfy_nodes/modules/nexfort/hijack_samplers.py", line 147, in calc_cond_batch_of
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
  File "/app/ComfyUI/comfy/model_base.py", line 123, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/onediff/infer_compiler/backends/nexfort/deployable_module.py", line 27, in forward
    return self._deployable_module_model(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1116, in __call__
    return self._torchdynamo_orig_callable(
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 948, in __call__
    result = self._inner_convert(
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 472, in __call__
    return _compile(
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_utils_internal.py", line 84, in wrapper_function
    return StrobelightCompileTimeProfiler.profile_compile_time(
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
    r = func(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
    transformations(instructions, code_options)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 582, in transform
    tracer.run()
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
    super().run()
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
    while self.step():
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
    self._return(inst)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
    self.output.compile_subgraph(
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1123, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1318, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
    r = func(*args, **kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1409, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1390, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/torch/__init__.py", line 1990, in __call__
    return self.compiler_fn(model_, inputs_, **self.kwargs)
  File "/root/miniconda3/envs/venv_image/lib/python3.10/site-packages/nexfort/dynamo/backends/nexfort.py", line 34, in nexfort
    out = compile_fx(gm, example_inputs, mode=mode, options=options)
  File "src/nexfort/fx_compiler/fx_compiler.py", line 40, in nexfort.fx_compiler.fx_compiler.compile_fx
  File "src/nexfort/fx_compiler/fx_compiler.py", line 48, in nexfort.fx_compiler.fx_compiler.compile_fx
  File "src/nexfort/fx_compiler/compile_cache/compiled_graph_cache.py", line 46, in nexfort.fx_compiler.compile_cache.compiled_graph_cache.get_or_build_compiled_fn
  File "src/nexfort/fx_compiler/compile_cache/compiled_graph_entry.py", line 143, in nexfort.fx_compiler.compile_cache.compiled_graph_entry.compiled_fx_compiled_graph_hash
  File "src/nexfort/fx_compiler/compile_cache/compiled_graph_entry.py", line 124, in nexfort.fx_compiler.compile_cache.compiled_graph_entry.FxCompiledGraphHashDetails.debug_lines
torch._dynamo.exc.BackendCompilerFailed: backend='nexfort' raised:
TypeError: Expected list, got str

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

Prompt executed in 2324.05 seconds