Open utterances-bot opened 1 year ago
Can the magic function solution generate a chinese calligraphy albums under or presented the specific personalized calligraphy style?
"The encoder of the VAE is only required during training and not during inference."
Did you mean decoder ? The encoder should still be needed because the noise prediction (which we'll try to minimise) works with latents, not images right ? The decoder on the other hand, only is needed to train the noise predictor, not to use it later.
Another question, why not use varying activations in the same neuron rather than a space-inefficient one-hot-encoded vector ? Eg. Activation of 1, 2, 3, 4, etc. each meaning different things rather than using a one-hot-encoded vector. It also saves compute required to train the text encoder
rekilblog - A different way to look at Stable Diffusion
https://rekil156.github.io/rekilblog/posts/lesson9_stableDissufion/Lesson9.html