Closed neo closed 10 months ago
Hi @neo, in optimum-neuron
we do compile post_quant_conv
layer.
Here we override forward
of vae by the decode
function and trace both. It means that the VAE Decoder exported by Optimum contains both "vae decoder" and the post_quant_conv.
What we do in optimum-neuron
Hey @JingyaHuang! thank you for such quick and concise explanation ❤️
I did do a quick search in the diffusers library for post_quant_conv
, I guess I just didn't understand enough about the difference between AutoencoderKL
and VQModel
I also had another follow-up question from reading the samples from the aws neuron team that's also kind of related to the other discussion we had here: https://huggingface.co/aws-neuron/stable-diffusion-xl-base-1-0-1024x1024/discussions/3
I see the set_dynamic_batching
is used only when loading the unet model with DataParallel
and wonder: does dynamic_batch_size
need to be a compilation-time option as a compiler_args
instead of being set when it's being load after compilation?
Thanks!! ❤️
Hi @neo,
For the diffusers implementation, I pointed you to VQ-VAE, which might be slightly confusing, sorry for that. Either VQ VAE or regular VAE: AutoencoderKL
, in Optimum we try to trace their decode function to avoid compiling two separate artifacts.
And for your question about the dynamic batching, if you want to enable dynamic batching, you will need to set dynamic_batch_size=True
when compiling your model with the API:
Or passing --dynamic-batch-size
to the CLI:
set_dynamic_batching
is more likely to be an option to turn off the dynamic batching (if the model has been compiled with the option on) instead of the contrary.
More details here
Okay so dynamic batching is indeed a compilation flag, thank you again for clarifying!!
Sorry hopefully last follow-up question 😂 then why does --num_images_per_prompt
need to be a compilation option? or is it just for convenience to set a default for the output config that can be override during inference?
with the samples from the aws neuron team, the VAE
post_quant_conv
is compiled: https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_sdxl_base_1024_inference.ipynbhowever, with our lib here, it's not being compiled; is that something we can do?