Open rottitime opened 7 months ago
Here's what I found regarding your problem. Let me know if it works!
Troubleshoot Error: "I am using Mac, the speed is very slow."
Some MAC users may need --disable-offload-from-vram
to speed up model loading.
Here's what I found regarding your problem. Let me know if it works!
Troubleshoot Error: "I am using Mac, the speed is very slow."
Some MAC users may need
--disable-offload-from-vram
to speed up model loading.
Thank you @foreignstyle for the suggestion. Didn't make any difference sadly
(fooocus) jaspaul@MacBook-Pro Fooocus % python entry_with_update.py --disable-offload-from-vram
Fast-forward merge
Update succeeded.
[System ARGV] ['entry_with_update.py', '--disable-offload-from-vram']
Python 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ]
Fooocus version: 2.1.844
Running on local URL: http://127.0.0.1:7865
To create a public link, set `share=True` in `launch()`.
Total VRAM 16384 MB, total RAM 16384 MB
Set vram state to: SHARED
Device: mps
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/Users/jaspaul/Public/repos/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
Enter LCM mode.
[Fooocus] Downloading LCM components ...
[Parameters] Adaptive CFG = 1.0
[Parameters] Sharpness = 0.0
[Parameters] ADM Scale = 1.0 : 1.0 : 0.0
[Parameters] CFG = 1.0
[Parameters] Seed = 1100476089425728703
[Parameters] Sampler = lcm - lcm
[Parameters] Steps = 8 - 8
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors
Request to load LoRAs [['None', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0], ('sdxl_lcm_lora.safetensors', 1.0)] for model [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors].
Loaded LoRA [/Users/jaspaul/Public/repos/Fooocus/models/loras/sdxl_lcm_lora.safetensors] for UNet [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors] with 788 keys at weight 1.0.
Requested to load SDXLClipModel
Loading 1 new model
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] pikachu, glowing, shiny, bright, detailed, very intricate, cinematic, stunning, winning, highly colorful, deep colors, inspired, original, fine detail, enhanced, color, perfect, vibrant, symmetry, vivid, coherent, sharp focus, complex, extremely quality, futuristic, professional, creative, appealing, cheerful, amazing, atmosphere, directed, dramatic, thought
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] pikachu, vibrant, magic, vivid colors, intricate, elegant, highly detailed, professional, artistic, cinematic,, singular, clear, pristine, thoughtful, inspired, charismatic, beautiful, illuminated, pretty, attractive, colorful, best, dramatic, perfect, sharp focus, divine, amazing, astonishing, marvelous, flowing, enormous, luxury, very inspirational, cool
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1280, 768)
Preparation time: 20.71 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 54.27 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [06:41<00:00, 50.13s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.35 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 461.61 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 58.31 seconds
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [07:33<00:00, 56.64s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.91 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 516.92 seconds
Total time: 1001.73 seconds
I also have the same configuration, increasing it by one to 30.40/s/it through the following command, but the improvement is not significant python entry_with_update.py --always-cpu
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
tokenizers
before the fork if possibleMight be memory but it seems others have chimed in with the same set up. I have 64Gb and set to extreme speed, I pumps out like 4 renders in a few minutes.
i have some problem
[Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 8018492891930229499 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] Home-based typing, highly detailed, sharp focus, elegant, intricate, cinematic, new classic, epic composition, colorful, mystical, scenic, rich deep colors, inspired, illuminated, amazing, very inspirational, shiny, smart, thought inspiring, wonderful, dramatic, artistic, color, perfect, dynamic light, great, atmosphere, marvelous,, luxury, beautiful, gorgeous [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] Home-based typing, vivid colors, sharp focus, elegant, highly detailed, innocent, formal, cute, determined, color, cool, background, dramatic light, professional, charming, best, pretty, sunny, illuminated, attractive, beautiful, epic, stunning, gorgeous, breathtaking, creative, positive, artistic, loving, healthy, vibrant, passionate, lovely, relaxed [Fooocus] Encoding positive #1 ... [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (896, 1152) Preparation time: 6.45 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 83.57 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
tokenizers
before the fork if possibleextremely slow estimated time 1:14:48
I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.
python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic
I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.
python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic
Tnx bro! I've seen an improvement, but it's still far from desirable 36.39s/it
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 3.0 [Parameters] Seed = 7339689169583121557 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] Home-based typing, attractive detailed, charming, delightful, professional, highly coherent, color excellent composition, dramatic calm intense cinematic light, beautiful detail, aesthetic, very inspirational, rich deep colors, inspired, lovely, cute, adorable, marvelous, intricate, epic, elegant, sharp focus, fabulous atmosphere, amazing, thought, iconic, perfect background, gorgeous, stunning, enormous [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] Home-based typing, highly detailed, sharp focus, cinematic, ambient, modern, structured, vivid, beautiful, expressive, pretty, attractive, classy, inspired, rich, color, illuminated, light, saturated, designed, deep clear, full, coherent, creative, positive, loving, vibrant, perfect, focused, lovely, cute, best, detail, bright, fabulous [Fooocus] Encoding positive #1 ... [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (1152, 896) Preparation time: 7.58 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 10.77 seconds 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [17:58<00:00, 35.96s/it] Requested to load AutoencoderKL Loading 1 new model Image generated with private log at: /Users/mac/Fooocus/outputs/2024-01-22/log.html Generating and saving time: 1383.06 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [18:56<00:00, 37.90s/it] Image generated with private log at: /Users/mac/Fooocus/outputs/2024-01-22/log.html Generating and saving time: 1481.11 seconds Total time: 2884.07 seconds
I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.
python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic
Thanks @TattyDon . Also with apple M1, 16 GB this reduced iteration time from original ~50 sec down to 10-12 sec instead of ~35 sec, as observed by @99kpv
I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.
python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic
Thanks @TattyDon . Also with apple M1, 16 GB this reduced iteration time from original ~50 sec down to 10-12 sec instead of ~35 sec, as observed by @99kpv
Is that the only thing you changed or did you make any additional configuration changes? I feel like we should get a short discussion with all the best tips to improve performance on Apple silicon.
That's ok I have changed - it's down to about 20 seconds / I for me. Not perfect but also not unusable (M1)
Has anyone here heard of Apple MLX? https://github.com/ml-explore/mlx
I'm tired of using these general purpose, NVIDIA oriented frameworks and seeing people on Apple Silicon be surprised that their computers are not performing as expected.
Someone should break the status quo and try implementing this framework into their LLMs and apps :D Cheers!
python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --preset realistic
I have not M# silicon, my Mac is Intel based hackintosh
100%|██████████████████████████████████████████████████| 6/6 [01:40<00:00, 16.72s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.07 seconds
[Fooocus] Saving image 1/1 to system ...
Image generated with private log at: /Users/alex/Fooocus/outputs/2024-06-17/log.html
Generating and saving time: 113.58 seconds
Total time: 119.55 seconds
Issue
I have installed on Apple Macbook Pro 2021, M1.
It is taking over 15 to 30minutes to create each image. People claim on various forums it only takes 1 minute on the same device. Can anyone advice on how to speed up image creation?
Full Console Log
Setup
MPS enabled
'Metal Performance Shaders' is enabled after I followed the Accelerated PyTorch training on Mac and I get the following output:
Settings:
Style
Model
Setting