Closed MysticDaedra closed 4 months ago
please upload the actual log here, its hard to follow link-to-link-to-log to then download it.
I totally forgot I could just upload the file, apologies. sdnext.log
i cannot reproduce the problem, see my log below:
09:54:34-225971 INFO Autodetect: model="Stable Diffusion XL" class=StableDiffusionXLPipeline file="/mnt/models/Stable-diffusion/sdxl/miaanimeSFWNSFWSDXL_v40.safetensors" size=6617MB
09:54:39-637988 DEBUG Setting model: pipeline=StableDiffusionXLPipeline config={'low_cpu_mem_usage': True, 'torch_dtype': torch.float16, 'load_connected_pipeline': True, 'extract_ema': True, 'original_config_file': 'configs/sd_xl_base.yaml', 'use_safetensors': True}
09:54:39-639071 DEBUG Setting model: enable VAE slicing
09:54:42-886140 INFO Model compile: pipeline=StableDiffusionXLPipeline mode=reduce-overhead backend=stable-fast fullgraph=True compile=['Model', 'VAE']
09:54:42-988677 INFO Model compile: task='Stable-fast' config={'memory_format': torch.contiguous_format, 'enable_jit': True, 'enable_jit_freeze': True, 'preserve_parameters': True, 'enable_cnn_optimization': True, 'enable_fused_linear_geglu': True, 'prefer_lowp_gemm': True, 'enable_xformers': False, 'enable_cuda_graph': True,
'enable_triton': True, 'trace_scheduler': False} time=0.02
09:54:43-908654 DEBUG GC: collected=143 device=cuda {'ram': {'used': 1.38, 'total': 47.05}, 'gpu': {'used': 8.59, 'total': 23.99}, 'retries': 0, 'oom': 0} time=0.26
09:54:43-914407 INFO Load model: time=9.43 load=9.43 native=1024 {'ram': {'used': 1.38, 'total': 47.05}, 'gpu': {'used': 8.59, 'total': 23.99}, 'retries': 0, 'oom': 0}
09:55:19-933228 INFO Applying hypertile: unet=320
09:55:19-951866 INFO Base: class=StableDiffusionXLPipeline
09:55:20-319214 DEBUG Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE set={'prompt_embeds': torch.Size([1, 77, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]), 'negative_prompt_embeds': torch.Size([1, 77, 2048]), 'negative_pooled_prompt_embeds': torch.Size([1, 1280]), 'guidance_scale': 6,
'generator': device(type='cuda'), 'num_inference_steps': 10, 'eta': 1.0, 'guidance_rescale': 0.7, 'denoising_end': None, 'output_type': 'latent', 'width': 1280, 'height': 720, 'parser': 'Full parser'}
09:55:20-328863 DEBUG Sampler: sampler="UniPC" config={'num_train_timesteps': 1000, 'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon', 'solver_order': 2, 'thresholding': False, 'sample_max_value': 1.0, 'predict_x0': 'bh1', 'lower_order_final': True}
Progress 3.75it/s █████████████████████████████████ 100% 10/10 00:02 00:00 Base
09:55:30-702229 INFO Saving: image="outputs/text/02031-miaanimeSFWNSFWSDXL_v40-mad max young woman character dancing and wearing.jpg" type=JPEG resolution=1280x720 size=0
09:55:30-712135 INFO Processed: images=1 time=10.77 its=0.93 memory={'ram': {'used': 3.26, 'total': 47.05}, 'gpu': {'used': 13.96, 'total': 23.99}, 'retries': 0, 'oom': 0}
09:56:00-496385 DEBUG Server: alive=True jobs=1 requests=76 uptime=925 memory=3.26/47.05 backend=Backend.DIFFUSERS state=idle
first try to reduce variables - don't use hypertile at the same time as stable-fast (in my case it does work, but rule of troubleshooting is always to reduce variables) and try to come up with as simple as possible reproducible scenario.
The freezing doesn't seem to be happening anymore, not sure how that as fixed, but the console errors remain. I disabled hypertile, removed all loras from the prompt, turned off adetailer. sdnext.log
i cannot reproduce. try setting inference mode to default no_grad?
any updates?
Here's with no_grad, seems to be the same error: sdnext.log
Sorry for taking so long on this, been pretty busy and didn't want to deal with it :/
I remembered that you said to disable hypertile, so here's another run with hypertile disabled. sdnext.log
Model compile: task='Stable-fast' config={'memory_format': torch.contiguous_format, 'enable_jit': True, 'enable_jit_freeze': True, 'preserve_parameters': True, 'enable_cnn_optimization': True, 'enable_fused_linear_geglu': True, 'prefer_lowp_gemm': True, 'enable_xformers': False, 'enable_cuda_graph': True, 'enable_triton': False, 'trace_scheduler': False} time=0.02
i just noticed that triton
is not available - stable-fast works without triton in-theory-only, i never actually waited long enough for compile to finish as its incredibly slow without it.
can you try pip install triton
from your venv
?
also, you have model offload enabled which means model is on cpu at the time of the compile. can you try with model offloading disabled?
and pls check if same error occurs with different sdxl models?
a bit of background, torch inference mode or no grad are supposed to set all params to no-grad, but they can only do that for known and initialized params and it seems that model you're loading includes some params that are not known so they are left as-is and then later compile fails because compile requires that all param are in no-grad mode.
pip install triton
returns two errors: Could not find a version that satisfies the requirement triton (from versions: none), and No matching distribution found for triton.
My understanding is that triton only works on Linux, and I'm using Windows 11 professional. Perhaps it's time to install WSL2? Looking into it, it seems a bunch of torch optimizations only work on linux as well, mainly due to triton.
Here's the log when disabling medvram: sdnext.log
Note that this was also with Juggernaut. Here's a log with medvram re-enabled but Juggernaut loaded: sdnext.log
My understanding is that triton only works on Linux, and I'm using Windows 11 professional. Perhaps it's time to install WSL2? Looking into it, it seems a bunch of torch optimizations only work on linux as well, mainly due to triton.
true. i suggested triton and forgot for a sec you're on windows. but yes, in general, i have zero downsides of wsl2, its my daily environment. only issue is that you do need to be somewhat familiar with linux in general. not much, but still.
Here's the log when disabling medvram
aaa, finally something different :) but not that helpful, this is a generic error stating that something is wrong betweeen torch and gpu. i typically run into those problems if i update device driver, but don't reboot and stuff like that.
i just noticed that
triton
is not available - stable-fast works without triton in-theory-only, i never actually waited long enough for compile to finish as its incredibly slow without it.
FWIW, I use stable-fast in Windows without Triton. The initial compile is a bit slow, but it does work and offer a decent speed boost. However, I find it a bit awkward in general; it crashes if the output resolution is changed too much, it crashes with some (all?) Lora, etc.
On one occasion I ran into a similar problem as the OP, but unfortunately I can't for the life of me remember what it was that was causing the trouble. I'll reply again if I remember.
this one kinda fell through the cracks, whats the current status? regarding crashing when output resolution or lora changes - well, that's the limitation of pretty much all actual compile methods - except, expected behavior is that it needs to recompile new model execution path given changed parameters. but if your compile is not stable to start with, frequent recompiles are only going to make it worse.
sure, it should not crash on recompile. but anytime you need to change resolution or lora, compile is probably not the best option. this applies to tensorrt, torch compile, etc. - pretty much all of them.
It's been a long while since I worked with stable-fast, I want to return to it at some point, but I got too busy.
What would be nice is if the compile could be saved somehow and then if the correct models/loras/resolutions are detected, it just grabs that saved compile. I don't know if that's even possible, but it is something I've been thinking about.
On Fri, Jun 7, 2024 at 6:16 PM Vladimir Mandic @.***> wrote:
this one kinda fell through the cracks, whats the current status? regarding crashing when output resolution or lora changes - well, that's the limitation of pretty much all actual compile methods - except, expected behavior is that it needs to recompile new model execution path given changed parameters. but if your compile is not stable to start with, frequent recompiles are only going to make it worse.
sure, it should not crash on recompile. but anytime you need to change resolution or lora, compile is probably not the best option. this applies to tensorrt, torch compile, etc. - pretty much all of them.
— Reply to this email directly, view it on GitHub https://github.com/vladmandic/automatic/issues/2991#issuecomment-2155740507, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDDLFBJCSOHXXLVLFDSLX3ZGJLPZAVCNFSM6AAAAABFA2CCLWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVG42DANJQG4 . You are receiving this because you authored the thread.Message ID: @.***>
What would be nice is if the compile could be saved somehow and then if the
some can. torch-trace results can. zluda does compile and saves result. stable-fast cannot. even worse, seems like stable-fast has been abandoned by its author. too bad as it was really promising.
this is the very first sentence on stable fast repo:
Active development on stable-fast has been paused.
based on that alone, i cannot proceed much here.
Issue Description
Trying stable-fast for the first time (apparently I hadn't had it installed properly before), on a fresh --reinstall. Latest dev, info below. With SDXL, it returns a bunch of errors and either freezes or generates a blank image (not even black, just... non-existent).
Works fine it seems with SD 1.5.
Version Platform Description
Python 3.10.6 (I know... need to update) Windows 11 Professional Dev a0fd8210 RTX 3070 8gb Torch 2.2.1, CUDA 12.1, Cudnn 8801 Diffusers 0.27.0, Gradio 3.43.2 Mozilla Firefox
Relevant log output
Backend
Diffusers
Branch
Dev
Model
SD-XL
Acknowledgements