Open CaptainVarghoss opened 9 months ago
This extensions hasn't worked for a while unless you use the dev branch, and even then it stopped working a while ago. Not sure what nvidia is doing but this is shameful.
I'm guessing you haven't kept up with it. It was updated a little over a month ago and it works fine in a1111. It requires that you have your files in specific places (not sub-folders) and there are a couple issues with installing it for some people, but I use it constantly in A1111 and with double the it/s it's still ~40% faster than Forge and at this point is the only thing keeping me from switching over completely.
The extension also seems to work just fine in Forge as far as I can tell and Forge even auto-loads the unet, but something in the way the unet stuff was changed in forge makes it not actually get used for inference.
Maybe it's because the main branch doesn't work with controlnet currently and that's built into Forge.
Man you’re lucky then. Many of use can’t get it to work. Installing it results in the DLL complaint and it doesn’t even show up in the ui anymore.
I would love it if forge could do what invokeai does for diffusion models. It lets you convert the model to a diffusion model with one click, with no messing with any settings etc. And it works super awesome.
It's not luck, there's an entire thread in the issues on the TRT extension github that shows how to fix those problems.
The documentation is garbage and installation is not as one-click as it could be, but it works fine.
Many people in their help section have the same issues. I would say if they added what sdnext and comfyui use for the backend we would benefit better.
I got it working once, but it was so buggy it stopped working.
https://github.com/chengzeyi/stable-fast would be welcomed as well as it works with controlnet.
I just started looking at Forge, but it seems to me that there is no unified way of getting the TRT extension working on both Auto1111 and Forge, as they use different APIs to overwrite the UNet. If there is anyone with more experience within the Forges code base, I'd be happy to know more about the best practices for implementing a different backend and make an POC.
TensorRT works fine for me for generated files I had in automatic1111, but it fails when I try generate new TensorRT files.
D:\stable-diffusion-webui-forge\ldm_patched\ldm\modules\diffusionmodules\openaimodel.py:857: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert y.shape[0] == x.shape[0]
D:\stable-diffusion-webui-forge\ldm_patched\ldm\modules\diffusionmodules\openaimodel.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert x.shape[1] == self.channels
ERROR:root:Exporting to ONNX failed. module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'
Building TensorRT engine... This can take a while, please check the progress in the terminal.
Building TensorRT engine for D:\stable-diffusion-webui-forge\models\Unet-onnx\mymodel.onnx: D:\stable-diffusion-webui-forge\models\Unet-trt\mymodel_78890989_cc86_sample=1x4x96x96+2x4x128x128+4x4x256x256-timesteps=1+2+4-encoder_hidden_states=1x77x2048+2x77x2048+4x231x2048-y=1x2816+2x2816+4x2816.trt
Could not open file D:\stable-diffusion-webui-forge\models\Unet-onnx\mymodel.onnx
I somehow made a workaround of this error and can successfully "Export Default Engine" (In other webui and repo they call this "create a profile in TensorRT").
But I'm a noob programmer so chances are I don't actually know what I'm doing. Someone might want to check what this actually does before they try the same.
In line:123 of webui_forge_cu121_torch21\webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py
:
replace the swap_sdpa
function into the following:
def swap_sdpa(func):
def wrapper(*args, **kwargs):
swap_sdpa = hasattr(F, "scaled_dot_product_attention")
print('#### Exporter.swap_sdpa hasattr(scaled_dot_product_attention) :: ' + str(hasattr(F, "scaled_dot_product_attention")))
old_sdpa = (
getattr(F, "scaled_dot_product_attention", None) if swap_sdpa else None
)
#if swap_sdpa:
# delattr(F, "scaled_dot_product_attention")
#ret = func(*args, **kwargs)
if swap_sdpa and old_sdpa:
delattr(F, "scaled_dot_product_attention")
setattr(F, "scaled_dot_product_attention", old_sdpa)
ret = func(*args, **kwargs)
return ret
return wrapper
My guess is the ret = func(*args, **kwargs)
part happens before setattr(... old_sdpa)
but after delattr(...)
,
so the scaled_dot_product_attention
method is gone from the object....?
TensorRT works fine for me for generated files I had in automatic1111, but it fails when I try generate new TensorRT files.
D:\stable-diffusion-webui-forge\ldm_patched\ldm\modules\diffusionmodules\openaimodel.py:857: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert y.shape[0] == x.shape[0] D:\stable-diffusion-webui-forge\ldm_patched\ldm\modules\diffusionmodules\openaimodel.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert x.shape[1] == self.channels ERROR:root:Exporting to ONNX failed. module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention' Building TensorRT engine... This can take a while, please check the progress in the terminal. Building TensorRT engine for D:\stable-diffusion-webui-forge\models\Unet-onnx\mymodel.onnx: D:\stable-diffusion-webui-forge\models\Unet-trt\mymodel_78890989_cc86_sample=1x4x96x96+2x4x128x128+4x4x256x256-timesteps=1+2+4-encoder_hidden_states=1x77x2048+2x77x2048+4x231x2048-y=1x2816+2x2816+4x2816.trt Could not open file D:\stable-diffusion-webui-forge\models\Unet-onnx\mymodel.onnx
@ceoper The swap_sdpa was a WAR for an issue in torch < 2.0 when exporting ONNX models. As forge is using a newer version it should be sufficient to simply comment out the @swap_sdpa
decorator.
@ceoper's fix together with --always-gpu
solved it for me.
Checklist
What happened?
TensorRT extension installs and seems to function properly on a clean install, Console shows unet is loaded and TRT profile loaded, but there is no change in generation time.
Steps to reproduce the problem
What should have happened?
Image generation speed should have increased significantly.
What browsers do you use to access the UI ?
Google Chrome
Sysinfo
Console logs
Additional information
No response