Closed klossm closed 1 year ago
Yes, I think this is still the same problem that you encountered earlier. I'll try to get rid of it in the next few days.
Last update should have this issue fixed (and #124 as well).
Another good thing: I realized that I had misapplied the optimizations for the diffusers. I have corrected this, and now, with full-on optimization (sliced attention + sequential offload + prior on CPU), it consumes about 3 GB of VRAM for a 1024x1024 image, which is quite impressive. And I did not even try to adapt xFormers/FlashAttention2 yet... 🤗🧨 - ❤️ :)
Thanks for the hard update, here's the new reported error。This error occurs when running to the final stage:Traceback (most recent call last): Here are my settings:
Enable half precision weights Enable sliced attention Enable sequential CPU offload Enable prior generation on CPU Enable xformers memory efficient attention
File "M:\ai\kubin\venv\lib\site-packages\gradio\routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "M:\ai\kubin\venv\lib\site-packages\gradio\blocks.py", line 1392, in process_api
result = await self.call_function(
File "M:\ai\kubin\venv\lib\site-packages\gradio\blocks.py", line 1097, in call_function
prediction = await anyio.to_thread.run_sync(
File "M:\ai\kubin\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "M:\ai\kubin\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "M:\ai\kubin\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, args)
File "M:\ai\kubin\venv\lib\site-packages\gradio\utils.py", line 703, in wrapper
response = f(args, kwargs)
File "M:\ai\kubin\src\ui_blocks\t2i.py", line 253, in generate
return generate_fn(params)
File "M:\ai\kubin\src\web_gui.py", line 42, in memory_efficient_attention
does not support inputs:
query : shape=(1, 9216, 1, 512) (torch.float16)
key : shape=(1, 9216, 1, 512) (torch.float16)
value : shape=(1, 9216, 1, 512) (torch.float16)
attn_bias : <class 'NoneType'>
p : 0.0
smallkF
is not supported because:
dtype=torch.float16 (supported: {torch.float32})
max(query.shape[-1] != value.shape[-1]) > 32
has custom scale
unsupported embed per head: 512
Attempt to disable xFormers or disable prior CPU generation, because these two options are currently conflicting (xFormers cannot be applied to CPU tensors).
Try to disable xformers.
It did fix it when I turned it off. Will xformers be fixed in the future? is this a significant performance optimization? if not then it doesn't matter
To my surprise, xFormers haven't yielded significant benefits; however, it's quite possible I'm not applying them correctly, and I haven't had enough time to thoroughly test their use yet. Eventually I will conduct further tests and will report back if their usage proves to be justified.
Thanks for all your hard work, also I would like to ask if the sampler in 2.2 is currently unselectable is it set up that way specifically?Or is it because this is a result of a problem with my installation?
Diffusers implementation currently has only one sampler available, that's why sampler box is non-interactive now. But I am planning to add more samplers, and when I will, the sampler selection box will be unblocked )
Thanks for the reply.
Here are my settings:
Enable half precision weights Enable sliced attention Enable sequential CPU offload Enable prior generation on CPU
This was an issue that came up in the last version, and since a new problem has arisen, I can't say for sure if the previous problem was ruled out https://github.com/seruva19/kubin/issues/124
Not sure if it's due to the update
The code that reported the error:
Traceback (most recent call last): File "M:\ai\kubin\venv\lib\site-packages\gradio\routes.py", line 439, in run_predict output = await app.get_blocks().process_api( File "M:\ai\kubin\venv\lib\site-packages\gradio\blocks.py", line 1384, in process_api result = await self.call_function( File "M:\ai\kubin\venv\lib\site-packages\gradio\blocks.py", line 1089, in call_function prediction = await anyio.to_thread.run_sync( File "M:\ai\kubin\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "M:\ai\kubin\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "M:\ai\kubin\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "M:\ai\kubin\venv\lib\site-packages\gradio\utils.py", line 700, in wrapper response = f(args, kwargs) File "M:\ai\kubin\src\ui_blocks\t2i.py", line 253, in generate return generate_fn(params) File "M:\ai\kubin\src\web_gui.py", line 42, in
generate_fn=lambda params: kubin.model.t2i(params),
File "M:\ai\kubin\src\models\model_diffusers22\model_22.py", line 168, in t2i
image_embeds, zero_embeds = prior(
File "M:\ai\kubin\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
File "M:\ai\kubin\venv\lib\site-packages\diffusers\pipelines\kandinsky2_2\pipeline_kandinsky2_2_prior.py", line 494, in call
predicted_image_embedding = self.prior(
File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "M:\ai\kubin\venv\lib\site-packages\diffusers\models\prior_transformer.py", line 277, in forward
time_embeddings = self.time_embedding(timesteps_projected)
File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "M:\ai\kubin\venv\lib\site-packages\diffusers\models\embeddings.py", line 192, in forward
sample = self.linear_1(sample)
File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(args, **kwargs)
File "M:\ai\kubin\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)