Open arjunsuresh opened 2 months ago
Hi,
May I ask if it is possible to test this on fp32 model?
Specifically, change precision: str = 'fp16'
in the custom config and change SDXLVAEBuilder
in code/stable-diffusion-xl/tensorrt/builder.py
to the following:
class SDXLVAEBuilder(SDXLBaseBuilder,
ArgDiscarder):
"""SDXL VAE builder class.
"""
def __init__(self,
*args,
component_name: str,
batch_size: int,
model_path: PathLike,
**kwargs):
vae_precision = 'fp32'
vae_path = model_path + "onnx_models/vae/model.onnx"
strongly_typed = False
super().__init__(*args,
model=VAE(name=component_name, max_batch_size=batch_size, precision=vae_precision, device='cuda'),
model_path=vae_path,
batch_size=batch_size,
strongly_typed=strongly_typed,
use_native_instance_norm=True,
**kwargs)
then make generate_engines RUN_ARGS="--benchmarks=stable-diffusion-xl --scenarios=Offline"
I was unable to build the int8 models on RTX3090 but fp16/32 models worked and got them running.
That's great. But the same code didn't work for me - still getting segmentation fault for fp16/fp32
.
That's strange.... personally i think the error is due to the model but being quantized/exported properly or the data was not properly preprocessed.
If you still have the commands history, is it possible for you to send all make
command here? Thanks
@Oseltamivir For me, the below line is the culprit. Commenting it out makes it work for me - though I haven't checked everything. I'll also try on different systems.
In that case it might be a issue caused by mitten.
But the maintainers don't seem to respond to issues posted in the mitten repo. I opened an issue there 3 months ago but got no reply. I ended up having to email Yiheng to ask about their implementation of mitten.
yes, it is. We'll come back to it as currently we are using Nvidia v4.0 code to collect the inference results via github actions.
Trying to run Nvidia v4.1 implementation for stable diffusion on RTX 4090.
ran successfully and produced int8 model. Below are the custom configs used for 2x RTX4090.