seruva19 / kubin

Web-GUI for Kandinsky text-to-image diffusion models.
175 stars 17 forks source link

I have a new problem with the update! #130

Closed klossm closed 1 year ago

klossm commented 1 year ago

Here are my settings:

Enable half precision weights Enable sliced attention Enable sequential CPU offload Enable prior generation on CPU

This was an issue that came up in the last version, and since a new problem has arisen, I can't say for sure if the previous problem was ruled out https://github.com/seruva19/kubin/issues/124

Not sure if it's due to the update

The code that reported the error:

Traceback (most recent call last): File "M:\ai\kubin\venv\lib\site-packages\gradio\routes.py", line 439, in run_predict output = await app.get_blocks().process_api( File "M:\ai\kubin\venv\lib\site-packages\gradio\blocks.py", line 1384, in process_api result = await self.call_function( File "M:\ai\kubin\venv\lib\site-packages\gradio\blocks.py", line 1089, in call_function prediction = await anyio.to_thread.run_sync( File "M:\ai\kubin\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "M:\ai\kubin\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "M:\ai\kubin\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "M:\ai\kubin\venv\lib\site-packages\gradio\utils.py", line 700, in wrapper response = f(args, kwargs) File "M:\ai\kubin\src\ui_blocks\t2i.py", line 253, in generate return generate_fn(params) File "M:\ai\kubin\src\web_gui.py", line 42, in generate_fn=lambda params: kubin.model.t2i(params), File "M:\ai\kubin\src\models\model_diffusers22\model_22.py", line 168, in t2i image_embeds, zero_embeds = prior( File "M:\ai\kubin\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "M:\ai\kubin\venv\lib\site-packages\diffusers\pipelines\kandinsky2_2\pipeline_kandinsky2_2_prior.py", line 494, in call predicted_image_embedding = self.prior( File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "M:\ai\kubin\venv\lib\site-packages\diffusers\models\prior_transformer.py", line 277, in forward time_embeddings = self.time_embedding(timesteps_projected) File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "M:\ai\kubin\venv\lib\site-packages\diffusers\models\embeddings.py", line 192, in forward sample = self.linear_1(sample) File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

seruva19 commented 1 year ago

Yes, I think this is still the same problem that you encountered earlier. I'll try to get rid of it in the next few days.

seruva19 commented 1 year ago

Last update should have this issue fixed (and #124 as well).

Another good thing: I realized that I had misapplied the optimizations for the diffusers. I have corrected this, and now, with full-on optimization (sliced attention + sequential offload + prior on CPU), it consumes about 3 GB of VRAM for a 1024x1024 image, which is quite impressive. And I did not even try to adapt xFormers/FlashAttention2 yet... 🤗🧨 - ❤️ :)

klossm commented 1 year ago

Thanks for the hard update, here's the new reported error。This error occurs when running to the final stage:Traceback (most recent call last): Here are my settings:

Enable half precision weights Enable sliced attention Enable sequential CPU offload Enable prior generation on CPU Enable xformers memory efficient attention

File "M:\ai\kubin\venv\lib\site-packages\gradio\routes.py", line 442, in run_predict output = await app.get_blocks().process_api( File "M:\ai\kubin\venv\lib\site-packages\gradio\blocks.py", line 1392, in process_api result = await self.call_function( File "M:\ai\kubin\venv\lib\site-packages\gradio\blocks.py", line 1097, in call_function prediction = await anyio.to_thread.run_sync( File "M:\ai\kubin\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "M:\ai\kubin\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "M:\ai\kubin\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "M:\ai\kubin\venv\lib\site-packages\gradio\utils.py", line 703, in wrapper response = f(args, kwargs) File "M:\ai\kubin\src\ui_blocks\t2i.py", line 253, in generate return generate_fn(params) File "M:\ai\kubin\src\web_gui.py", line 42, in generate_fn=lambda params: kubin.model.t2i(params), File "M:\ai\kubin\src\models\model_diffusers22\model_22.py", line 202, in t2i current_batch = decoder( File "M:\ai\kubin\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "M:\ai\kubin\venv\lib\site-packages\diffusers\pipelines\kandinsky2_2\pipeline_kandinsky2_2.py", line 274, in call image = self.movq.decode(latents, force_not_quantize=True)["sample"] File "M:\ai\kubin\venv\lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper return method(self, args, kwargs) File "M:\ai\kubin\venv\lib\site-packages\diffusers\models\vq_model.py", line 139, in decode dec = self.decoder(quant2, quant if self.config.norm_type == "spatial" else None) File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "M:\ai\kubin\venv\lib\site-packages\diffusers\models\vae.py", line 265, in forward sample = self.mid_block(sample, latent_embeds) File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "M:\ai\kubin\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 511, in forward hidden_states = attn(hidden_states, temb=temb) File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "M:\ai\kubin\venv\lib\site-packages\diffusers\models\attention_processor.py", line 322, in forward return self.processor( File "M:\ai\kubin\venv\lib\site-packages\diffusers\models\attention_processor.py", line 1034, in call hidden_states = xformers.ops.memory_efficient_attention( File "M:\ai\kubin\venv\lib\site-packages\xformers\ops\fmha__init.py", line 192, in memory_efficient_attention return _memory_efficient_attention( File "M:\ai\kubin\venv\lib\site-packages\xformers\ops\fmha__init__.py", line 290, in _memory_efficient_attention return _memory_efficient_attention_forward( File "M:\ai\kubin\venv\lib\site-packages\xformers\ops\fmha\init__.py", line 308, in _memory_efficient_attention_forward _ensure_op_supports_or_raise(ValueError, "memory_efficient_attention", op, inp) File "M:\ai\kubin\venv\lib\site-packages\xformers\ops\fmha\dispatch.py", line 45, in _ensure_op_supports_or_raise raise exc_type( ValueError: Operator memory_efficient_attention does not support inputs: query : shape=(1, 9216, 1, 512) (torch.float16) key : shape=(1, 9216, 1, 512) (torch.float16) value : shape=(1, 9216, 1, 512) (torch.float16) attn_bias : <class 'NoneType'> p : 0.0 smallkF is not supported because: dtype=torch.float16 (supported: {torch.float32}) max(query.shape[-1] != value.shape[-1]) > 32 has custom scale unsupported embed per head: 512

seruva19 commented 1 year ago

Attempt to disable xFormers or disable prior CPU generation, because these two options are currently conflicting (xFormers cannot be applied to CPU tensors).

klossm commented 1 year ago

Try to disable xformers.

It did fix it when I turned it off. Will xformers be fixed in the future? is this a significant performance optimization? if not then it doesn't matter

seruva19 commented 1 year ago

To my surprise, xFormers haven't yielded significant benefits; however, it's quite possible I'm not applying them correctly, and I haven't had enough time to thoroughly test their use yet. Eventually I will conduct further tests and will report back if their usage proves to be justified.

klossm commented 1 year ago

Thanks for all your hard work, also I would like to ask if the sampler in 2.2 is currently unselectable is it set up that way specifically?Or is it because this is a result of a problem with my installation?

seruva19 commented 1 year ago

Diffusers implementation currently has only one sampler available, that's why sampler box is non-interactive now. But I am planning to add more samplers, and when I will, the sampler selection box will be unblocked )

klossm commented 1 year ago

Thanks for the reply.