pinokiofactory / clarity-refiners-ui

Creative Image Enhancer/Upscaler. Powered By Refiners. 8GB VRAM | 10GB Install
Other
18 stars 3 forks source link

Problem #5

Open djkakadu opened 3 days ago

djkakadu commented 3 days ago

Hello every image i try, i get these error: `❌ Error during processing: File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 858, in forward super().forward(inputs) File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward result = self._call_layer(layer, name, intermediate_args)

File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward result = self._call_layer(layer, name, intermediate_args) File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward result = self._call_layer(layer, name, intermediate_args) File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 922, in forward return super().forward(inputs) + inputs[0] File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward result = self._call_layer(layer, name, intermediate_args) File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward result = self._call_layer(layer, name, intermediate_args) File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward result = self._call_layer(layer, name, intermediate_args) File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 922, in forward return super().forward(inputs) + inputs[0] File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward result = self._call_layer(layer, name, intermediate_args) File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\chain.py", line 249, in forward result = self._call_layer(layer, name, *intermediate_args) OutOfMemoryError: File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\attentions.py", line 129, in forward return self._process_attention( File "C:\pinokio\api\clarity-refiners-ui.git\app\env\lib\site-packages\refiners\fluxion\layers\attentions.py", line 29, in scaled_dot_product_attention return _scaled_dot_product_attention( CUDA out of memory. Tried to allocate 7.75 GiB. GPU

(CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False) ├── (PAR) │ └── Identity() (x3) ├── (DISTR) │ └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16) (x3) ├── >>> ScaledDotProductAttention(num_heads=8) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock.Residual_1.SelfAttention.ScaledDotProductAttention └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16) 0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-6.97, max=7.16, mean=-0.05, std=1.51, norm=4841.88, grad=False) 1: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-6.69, max=7.00, mean=-0.01, std=1.58, norm=5069.41, grad=False) 2: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-2.86, max=2.66, mean=0.01, std=0.46, norm=1474.05, grad=False)

(RES) Residual() ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16) └── >>> (CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock.Residual_1.SelfAttention ├── (PAR) │ └── Identity() (x3) ├── (DISTR) │ └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16) (x3) ├── ScaledDotProductAttention(num_heads=8) └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16) 0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-2.91, max=2.64, mean=0.01, std=0.54, norm=1737.50, grad=False)

(CHAIN) CrossAttentionBlock(embedding_dim=320, context_embedding_dim=768, context_key=clip_text_embedding, num_heads=8, use_bias=False) ├── >>> (RES) Residual() | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock.Residual_1 #1 │ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16) │ └── (CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False) │ ├── (PAR) ... │ ├── (DISTR) ... │ ├── ScaledDotProductAttention(num_heads=8) │ └── Linear(in_features=320, out_features=320, device=cuda:0, dtype=bfloat16) ├── (RES) Residual() #2 │ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16) │ ├── (PAR) 0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-1.66, max=1.97, mean=0.01, std=0.24, norm=780.96, grad=False)

(CHAIN) └── >>> (CHAIN) CrossAttentionBlock(embedding_dim=320, context_embedding_dim=768, context_key=clip_text_embedding, num_heads=8, use_bias=False) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2.CrossAttentionBlock ├── (RES) Residual() #1 │ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16) │ └── (CHAIN) SelfAttention(embedding_dim=320, num_heads=8, inner_dim=320, use_bias=False) ... ├── (RES) Residual() #2 │ ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16) │ ├── (PAR) ... │ └── (CHAIN) Attention(embedding_dim=320, num_heads=8, key_embedding_dim=768, value_embedding_dim=768, inner_dim=320, use_bias=False) ... └── (RES) Residual() #3 ├── LayerNorm(normalized_shape=(320,), device=cuda:0, dtype=bfloat16) 0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-1.66, max=1.97, mean=0.01, std=0.24, norm=780.96, grad=False)

(RES) CLIPLCrossAttention(channels=320) ├── (CHAIN) #1 │ ├── GroupNorm(num_groups=32, eps=1e-06, channels=320, device=cuda:0, dtype=bfloat16) │ ├── Conv2d(in_channels=320, out_channels=320, kernel_size=(1, 1), device=cuda:0, dtype=bfloat16) │ ├── (CHAIN) StatefulFlatten(start_dim=2) │ │ ├── SetContext(context=flatten, key=sizes) │ │ └── Flatten(start_dim=2) │ └── Transpose(dim0=1, dim1=2) ├── >>> (CHAIN) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention.Chain_2 #2 │ └── (CHAIN) CrossAttentionBlock(embedding_dim=320, context_embedding_dim=768, context_key=clip_text_embedding, num_heads=8, use_bias=False) │ ├── (RES) Residual() #1 ... 0: Tensor(shape=(2, 16128, 320), dtype=bfloat16, device=cuda:0, min=-1.66, max=1.97, mean=0.01, std=0.24, norm=780.96, grad=False)

(CHAIN) ├── (SUM) ResidualBlock(in_channels=320, out_channels=320) │ ├── (CHAIN) │ │ ├── GroupNorm(num_groups=32, channels=320, device=cuda:0, dtype=bfloat16) #1 │ │ ├── SiLU() #1 │ │ ├── (SUM) RangeAdapter2d(channels=320, embedding_dim=1280) ... │ │ ├── GroupNorm(num_groups=32, channels=320, device=cuda:0, dtype=bfloat16) #2 │ │ ├── SiLU() #2 │ │ └── Conv2d(in_channels=320, out_channels=320, kernel_size=(3, 3), padding=(1, 1), device=cuda:0, dtype=bfloat16) │ └── Identity() ├── >>> (RES) CLIPLCrossAttention(channels=320) | SD1UNet.Controlnet.DownBlocks.Chain_2.CLIPLCrossAttention 0: Tensor(shape=(2, 320, 144, 112), dtype=bfloat16, device=cuda:0, min=-8.19, max=6.47, mean=-0.09, std=0.69, norm=2236.34, grad=False)

(CHAIN) DownBlocks(in_channels=4) ├── (CHAIN) #1 │ ├── Conv2d(in_channels=4, out_channels=320, kernel_size=(3, 3), padding=(1, 1), device=cuda:0, dtype=bfloat16) │ ├── (RES) Residual() │ │ ├── UseContext(context=controlnet, key=condition_tile) │ │ └── (CHAIN) ConditionEncoder() ... │ └── (PASS) │ ├── Conv2d(in_channels=320, out_channels=320, kernel_size=(1, 1), device=cuda:0, dtype=bfloat16) │ └── Lambda(_store_residual(x: torch.Tensor)) ├── >>> (CHAIN) (x2) | SD1UNet.Controlnet.DownBlocks.Chain_2 #2 │ ├── (SUM) ResidualBlock(in_channels=320, out_channels=320) 0: Tensor(shape=(2, 320, 144, 112), dtype=bfloat16, device=cuda:0, min=-6.19, max=6.31, mean=-0.04, std=0.67, norm=2141.49, grad=False)

(PASS) Controlnet(name=tile, scale=0.6) ├── (PASS) TimestepEncoder() │ ├── UseContext(context=diffusion, key=timestep) │ ├── (CHAIN) RangeEncoder(sinusoidal_embedding_dim=320, embedding_dim=1280) │ │ ├── Lambda(compute_sinusoidal_embedding(x: jaxtyping.Int[Tensor, 'batch 1']) -> jaxtyping.Float[Tensor, 'batch 1 embedding_dim']) │ │ ├── Converter(set_device=False) │ │ ├── Linear(in_features=320, out_features=1280, device=cuda:0, dtype=bfloat16) #1 │ │ ├── SiLU() │ │ └── Linear(in_features=1280, out_features=1280, device=cuda:0, dtype=bfloat16) #2 │ └── SetContext(context=range_adapter, key=timestep_embedding_tile) ├── Slicing(dim=1, end=4) 0: Tensor(shape=(2, 4, 144, 112), dtype=bfloat16, device=cuda:0, min=-3.31, max=4.03, mean=0.48, std=1.11, norm=433.98, grad=False)

(CHAIN) SD1UNet(in_channels=4) ├── >>> (PASS) Controlnet(name=tile, scale=0.6) | SD1UNet.Controlnet │ ├── (PASS) TimestepEncoder() │ │ ├── UseContext(context=diffusion, key=timestep) │ │ ├── (CHAIN) RangeEncoder(sinusoidal_embedding_dim=320, embedding_dim=1280) ... │ │ └── SetContext(context=range_adapter, key=timestep_embedding_tile) │ ├── Slicing(dim=1, end=4) │ ├── (CHAIN) DownBlocks(in_channels=4) │ │ ├── (CHAIN) #1 ... │ │ ├── (CHAIN) (x2) #2 ... │ │ ├── (CHAIN) #3 ... 0: Tensor(shape=(2, 4, 144, 112), dtype=bfloat16, device=cuda:0, min=-3.31, max=4.03, mean=0.48, std=1.11, norm=433.98, grad=False)`

ai-anchorite commented 3 days ago

That doesn't look fun.

The fatality is: CUDA out of memory. Tried to allocate 7.75 GiB. GPU

Are you using an older or workstation GPU?

djkakadu commented 3 days ago

I use 2060 Super 8GB

ai-anchorite commented 3 days ago

hmm. did you disable SYSMEM fallback at some point? otherwise make sure your nvidia driver is up to date. It should happily fall into shared memory and just slow down if you exceed 8GB VRAM

ai-anchorite commented 3 days ago

oh actually, i think bfloat16 was only supported from Ampere Nvidia onwards. That's worth checking into.

It was changed from float16 to bfloat16 for (phat) Mac to work better, but not helpful it it breaks it for 1000 and 2000 grn Nvidia!

it was changed in two places in this commit: https://github.com/pinokiofactory/clarity-refiners-ui/commit/80a3efdcdc2dd87ddd635518df5b69fad23e1920

line 111: https://github.com/pinokiofactory/clarity-refiners-ui/blob/c0ba342914c6b46da5c778077aef7725b26b2237/app/app.py#L111

and 126: https://github.com/pinokiofactory/clarity-refiners-ui/blob/c0ba342914c6b46da5c778077aef7725b26b2237/app/app.py#L126

change: "bfloat16" to "float16"