rottitime commented 7 months ago

Issue

I have installed on Apple Macbook Pro 2021, M1.

It is taking over 15 to 30minutes to create each image. People claim on various forums it only takes 1 minute on the same device. Can anyone advice on how to speed up image creation?

Full Console Log

[Prompt Expansion] pikachu, cool color perfect colors, detailed, strong crisp, heroic, cinematic, dramatic, professional, symmetry, great composition, dynamic light, atmosphere, vivid, beautiful, emotional, highly detail, intricate, stunning, enhanced, inspired, colorful, shiny, transparent, lovely, cute, divine, elegant, coherent, pretty, best, novel, background, fine
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] pikachu, epic, beautiful, elegant, intricate, cinematic, highly detailed, artistic, sharp focus, colorful, surreal, dramatic ambient light, open background, magic, cute, adorable, magical, thought, extremely coherent, charismatic, iconic, creative, positive, awesome, joyful, pure, very inspirational, bright, friendly, glowing, clear, color, inspired
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1280, 768)
Preparation time: 31.29 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 70.69 seconds
100%|█████████████████████████████████████████████| 8/8 [07:27<00:00, 55.99s/it]Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.86 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 524.25 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 65.45 seconds
100%|█████████████████████████████████████████████| 8/8 [07:35<00:00, 56.88s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.09 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 526.07 seconds
Total time: 1085.09 seconds

Setup

Device: Apple M1 MacBook Pro, OS: 14.2 (23C64)
Memory: 16 GB
Model: juggernautXL_version6Rundiffusion.safetensors
Conda 23.11.0
Python 3.11.5

MPS enabled

'Metal Performance Shaders' is enabled after I followed the Accelerated PyTorch training on Mac and I get the following output:

tensor([1.], device='mps:0')

Settings:

Style

Model

Setting

foreignstyle commented 7 months ago

Here's what I found regarding your problem. Let me know if it works!

Troubleshoot Error: "I am using Mac, the speed is very slow."

Some MAC users may need --disable-offload-from-vram to speed up model loading.

rottitime commented 7 months ago

Here's what I found regarding your problem. Let me know if it works!

Troubleshoot Error: "I am using Mac, the speed is very slow."

Some MAC users may need --disable-offload-from-vram to speed up model loading.

Thank you @foreignstyle for the suggestion. Didn't make any difference sadly


(fooocus) jaspaul@MacBook-Pro Fooocus % python entry_with_update.py --disable-offload-from-vram 
Fast-forward merge
Update succeeded.
[System ARGV] ['entry_with_update.py', '--disable-offload-from-vram']
Python 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ]
Fooocus version: 2.1.844
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 16384 MB, total RAM 16384 MB
Set vram state to: SHARED
Device: mps
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/Users/jaspaul/Public/repos/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
Enter LCM mode.
[Fooocus] Downloading LCM components ...
[Parameters] Adaptive CFG = 1.0
[Parameters] Sharpness = 0.0
[Parameters] ADM Scale = 1.0 : 1.0 : 0.0
[Parameters] CFG = 1.0
[Parameters] Seed = 1100476089425728703
[Parameters] Sampler = lcm - lcm
[Parameters] Steps = 8 - 8
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors
Request to load LoRAs [['None', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0], ('sdxl_lcm_lora.safetensors', 1.0)] for model [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors].
Loaded LoRA [/Users/jaspaul/Public/repos/Fooocus/models/loras/sdxl_lcm_lora.safetensors] for UNet [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors] with 788 keys at weight 1.0.
Requested to load SDXLClipModel
Loading 1 new model
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] pikachu, glowing, shiny, bright, detailed, very intricate, cinematic, stunning, winning, highly colorful, deep colors, inspired, original, fine detail, enhanced, color, perfect, vibrant, symmetry, vivid, coherent, sharp focus, complex, extremely quality, futuristic, professional, creative, appealing, cheerful, amazing, atmosphere, directed, dramatic, thought
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] pikachu, vibrant, magic, vivid colors, intricate, elegant, highly detailed, professional, artistic, cinematic,, singular, clear, pristine, thoughtful, inspired, charismatic, beautiful, illuminated, pretty, attractive, colorful, best, dramatic, perfect, sharp focus, divine, amazing, astonishing, marvelous, flowing, enormous, luxury, very inspirational, cool
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1280, 768)
Preparation time: 20.71 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 54.27 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [06:41<00:00, 50.13s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.35 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 461.61 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 58.31 seconds
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [07:33<00:00, 56.64s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.91 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 516.92 seconds
Total time: 1001.73 seconds

maxrx215 commented 7 months ago

I also have the same configuration, increasing it by one to 30.40/s/it through the following command, but the improvement is not significant python entry_with_update.py --always-cpu

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

eddyizm commented 7 months ago

Might be memory but it seems others have chimed in with the same set up. I have 64Gb and set to extreme speed, I pumps out like 4 renders in a few minutes.

99kpv commented 6 months ago

i have some problem

[Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 4.0 [Parameters] Seed = 8018492891930229499 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] Home-based typing, highly detailed, sharp focus, elegant, intricate, cinematic, new classic, epic composition, colorful, mystical, scenic, rich deep colors, inspired, illuminated, amazing, very inspirational, shiny, smart, thought inspiring, wonderful, dramatic, artistic, color, perfect, dynamic light, great, atmosphere, marvelous,, luxury, beautiful, gorgeous [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] Home-based typing, vivid colors, sharp focus, elegant, highly detailed, innocent, formal, cute, determined, color, cool, background, dramatic light, professional, charming, best, pretty, sunny, illuminated, attractive, beautiful, epic, stunning, gorgeous, breathtaking, creative, positive, artistic, loving, healthy, vibrant, passionate, lovely, relaxed [Fooocus] Encoding positive #1 ... [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (896, 1152) Preparation time: 6.45 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 83.57 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) 0%| | 0/30 [00:00<?, ?it/s]/Users/mac/Fooocus/modules/anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.) s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True) 7%|███████▍ | 2/30 [05:19<1:14:48, 160.32s/it]

extremely slow estimated time 1:14:48

TattyDon commented 6 months ago

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

99kpv commented 5 months ago

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

Tnx bro! I've seen an improvement, but it's still far from desirable 36.39s/it

App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865 [Parameters] Adaptive CFG = 7 [Parameters] Sharpness = 2 [Parameters] ADM Scale = 1.5 : 0.8 : 0.3 [Parameters] CFG = 3.0 [Parameters] Seed = 7339689169583121557 [Parameters] Sampler = dpmpp_2m_sde_gpu - karras [Parameters] Steps = 30 - 15 [Fooocus] Initializing ... [Fooocus] Loading models ... Refiner unloaded. [Fooocus] Processing prompts ... [Fooocus] Preparing Fooocus text #1 ... [Prompt Expansion] Home-based typing, attractive detailed, charming, delightful, professional, highly coherent, color excellent composition, dramatic calm intense cinematic light, beautiful detail, aesthetic, very inspirational, rich deep colors, inspired, lovely, cute, adorable, marvelous, intricate, epic, elegant, sharp focus, fabulous atmosphere, amazing, thought, iconic, perfect background, gorgeous, stunning, enormous [Fooocus] Preparing Fooocus text #2 ... [Prompt Expansion] Home-based typing, highly detailed, sharp focus, cinematic, ambient, modern, structured, vivid, beautiful, expressive, pretty, attractive, classy, inspired, rich, color, illuminated, light, saturated, designed, deep clear, full, coherent, creative, positive, loving, vibrant, perfect, focused, lovely, cute, best, detail, bright, fabulous [Fooocus] Encoding positive #1 ... [Fooocus] Encoding positive #2 ... [Fooocus] Encoding negative #1 ... [Fooocus] Encoding negative #2 ... [Parameters] Denoising Strength = 1.0 [Parameters] Initial Latent shape: Image Space (1152, 896) Preparation time: 7.58 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 Requested to load SDXL Loading 1 new model [Fooocus Model Management] Moving model(s) has taken 10.77 seconds 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [17:58<00:00, 35.96s/it] Requested to load AutoencoderKL Loading 1 new model Image generated with private log at: /Users/mac/Fooocus/outputs/2024-01-22/log.html Generating and saving time: 1383.06 seconds [Sampler] refiner_swap_method = joint [Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [18:56<00:00, 37.90s/it] Image generated with private log at: /Users/mac/Fooocus/outputs/2024-01-22/log.html Generating and saving time: 1481.11 seconds Total time: 2884.07 seconds

mlisovyi commented 5 months ago

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

Thanks @TattyDon . Also with apple M1, 16 GB this reduced iteration time from original ~50 sec down to 10-12 sec instead of ~35 sec, as observed by @99kpv

eddyizm commented 4 months ago

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

Thanks @TattyDon . Also with apple M1, 16 GB this reduced iteration time from original ~50 sec down to 10-12 sec instead of ~35 sec, as observed by @99kpv

Is that the only thing you changed or did you make any additional configuration changes? I feel like we should get a short discussion with all the best tips to improve performance on Apple silicon.

TattyDon commented 4 months ago

That's ok I have changed - it's down to about 20 seconds / I for me. Not perfect but also not unusable (M1)

originalmagneto commented 4 months ago

Has anyone here heard of Apple MLX? https://github.com/ml-explore/mlx

I'm tired of using these general purpose, NVIDIA oriented frameworks and seeing people on Apple Silicon be surprised that their computers are not performing as expected.

Someone should break the status quo and try implementing this framework into their LLMs and apps :D Cheers!

TorAllex commented 1 month ago

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --preset realistic I have not M# silicon, my Mac is Intel based hackintosh

100%|██████████████████████████████████████████████████| 6/6 [01:40<00:00, 16.72s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.07 seconds
[Fooocus] Saving image 1/1 to system ...
Image generated with private log at: /Users/alex/Fooocus/outputs/2024-06-17/log.html
Generating and saving time: 113.58 seconds
Total time: 119.55 seconds

lllyasviel / Fooocus

Apple Macbook Pro M1 extremely slow #1446