Flux doesn't work on Macbook Pro M1 Max

achiever1984 commented 4 weeks ago

Hello.

When I try to generate an image in Flux mode using the flux1-dev-bnb-nf4.safetensors model on my macbook, I get the following error:

0%| | 0/20 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Avoid using tokenizers before the fork if possible

Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) 0%| | 0/20 [00:00<?, ?it/s] Traceback (most recent call last): File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/modules_forge/main_thread.py", line 30, in work self.result = self.func(*self.args, self.kwargs) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/modules/txt2img.py", line 110, in txt2img_function processed = processing.process_images(p) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/modules/processing.py", line 809, in process_images res = process_images_inner(p) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/modules/processing.py", line 952, in process_images_inner samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/modules/processing.py", line 1323, in sample samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/modules/sd_samplers_kdiffusion.py", line 234, in sample samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, extra_params_kwargs)) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/modules/sd_samplers_common.py", line 272, in launch_sampling return func() File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/modules/sd_samplers_kdiffusion.py", line 234, in samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, extra_params_kwargs)) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, *kwargs) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/k_diffusion/sampling.py", line 128, in sample_euler denoised = model(x, sigma_hat s_in, extra_args) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, kwargs) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/modules/sd_samplers_cfg_denoiser.py", line 186, in forward denoised, cond_pred, uncond_pred = sampling_function(self, denoiser_params=denoiser_params, cond_scale=cond_scale, cond_composition=cond_composition) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/backend/sampling/sampling_function.py", line 339, in sampling_function denoised, cond_pred, uncond_pred = sampling_function_inner(model, x, timestep, uncond, cond, cond_scale, model_options, seed, return_full=True) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/backend/sampling/sampling_function.py", line 284, in sampling_function_inner cond_pred, uncond_pred = calc_cond_uncondbatch(model, cond, uncond, x, timestep, model_options) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/backend/sampling/sampling_function.py", line 254, in calc_cond_uncond_batch output = model.apply_model(inputx, timestep, c).chunk(batch_chunks) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/backend/modules/k_model.py", line 45, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, *extra_conds).float() File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, kwargs) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/backend/nn/flux.py", line 393, in forward out = self.inner_forward(img, img_ids, context, txt_ids, timestep, y, guidance) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/backend/nn/flux.py", line 350, in inner_forward img = self.img_in(img) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, kwargs) File "/Users/vladimirkrutikov/stable-diffusion-webui-forge/backend/operations.py", line 112, in forward return torch.nn.functional.linear(x, self.weight, self.bias) RuntimeError: linear(): input and weight.T shapes cannot be multiplied (4032x64 and 1x98304) linear(): input and weight.T shapes cannot be multiplied (4032x64 and 1x98304)

What can I do to fix this?

ryanpage8 commented 4 weeks ago

Same issue

yumo7031 commented 4 weeks ago

+!

EuroCluddy commented 4 weeks ago

Yep. Same here unfortunately. (using Mac Studio M1 Max)

carlosiimolina commented 4 weeks ago

same here!

whz739723619 commented 4 weeks ago

same issue!

hochonin93 commented 3 weeks ago

same issue!

beerlogoff commented 3 weeks ago

I have the same issue on M2 Pro.

I use webui version from February 5th Python v3.10 Flux flux1-dev-bnb-nf4-v2.safetensors & flux1-dev-bnb-nf4.safetensors

rom1win commented 3 weeks ago

Strangely I have been having the exact same error with my AMD card on Linux.

achiever1984 commented 3 weeks ago

https://github.com/lllyasviel/stable-diffusion-webui-forge/pull/1264

Nothing changes

conornash commented 3 weeks ago

@achiever1984 I can get it to work with the official Flux Dev release from Huggingface https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main

I agree that the flux1-dev-bnb-nf4.safetensors doesn't work - apologies if this doesn't help you.

achiever1984 commented 3 weeks ago

@achiever1984 I can get it to work with the official Flux Dev release from Huggingface https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main

I agree that the flux1-dev-bnb-nf4.safetensors doesn't work - apologies if this doesn't help you.

1724143579399

alexpulich commented 3 weeks ago

@achiever1984 please try t5xxl_fp8_e4m3fn.safetensors instead of fp16. For me it started working after pulling changes from @conornash on MBP M3 Pro

UPD. ah, never mind, I see @conornash used fp16 encoder as well

achiever1984 commented 3 weeks ago

@achiever1984 please try t5xxl_fp8_e4m3fn.safetensors instead of fp16. For me it started working after pulling changes from @conornash on MBP M3 Pro

UPD. ah, never mind, I see @conornash used fp16 encoder as well

1724144409124

hochonin93 commented 3 weeks ago

m2 show this error TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

l0stl0rd commented 3 weeks ago

it is strange and very checkpoint dependant. For example with the dev fp8 and the fp16 t5xxl I get this which makes not sense.

TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype. Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

When I deflect the Fp16 T5: AssertionError: You do not have T5 state dict! You do not have T5 state dict!

Also the FP16 T5 should be ok it is about twice the size of the FP8.

So yes I get the same error no matter if I select the FP16 or FP8 which makes no sense.

One more thing, turning this option on or off does not make a difference: Enable T5 (load T5 text encoder; increases VRAM use by a lot, potentially improving quality of generation; requires model reload to apply)

aimerib commented 2 weeks ago

Just checked out @conornash 's branch and for the first time I was able to load a flux model on my Apple M1 Max 32GB. ❤️

barto95 commented 1 week ago

Test forgeui branch from @conornash but same problem:

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
^CInterrupted with signal 2 in <frame at 0x31b61f840, file '/Users/barto/stable-diffusion-webui-forge-fork/modules_forge/main_thread.py', line 43, code loop>

withfp8 or FP16 same problem:

CleanShot 2024-09-01 at 19 45 03

info mac: Apple M1 MAX Memory: 64gb

:(

OrenMoveo commented 1 week ago

+!

same issue Apple M1, Macbook Pro

MigCurto commented 1 week ago

Had the same issues on a M2 Ultra (TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.) Replaced my flux.py file with the one from @conornash and started working but didnt test exhaustively , hopefully it will be merged with main branch but anyone can replace that file:

Download RAW file and replace accordingly: https://github.com/lllyasviel/stable-diffusion-webui-forge/blob/643c1089ca150294d96470b6d5f2bd73e0bd3da3/backend/nn/flux.py#L1

Tested with Flux Dev, AFAIK NF4 will not work , not an expert but something about not being compatible with GPU and at least for SwarmUI some Flux spins seem dependent on bitsandbytes being ported to Macs.

EDIT2:Seems to be working with GUFF as well (but couldnt make it work with Schnell) , didnt notice any speed improvements with Q8 tought,

YofarDev commented 5 days ago

After replacing flux.py, I get this error :

NotImplementedError: The operator 'aten::rshift.Scalar' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

There is already export PYTORCH_ENABLE_MPS_FALLBACK=1 in webui-macos-env.sh, I don't know if it's supposed to be added somewhere else. I'm using Stability Matrix, I'm gonna try with a vanilla installation.

MigCurto commented 3 days ago

Dont wanna hijack this thread but its relevant I guess, for some reason after updating today (git pull) Flux stopped working as it should , images cant resolve (noisy) , same settings as before.

EDIT: My bad, it seems things are changing fast and at least for Flux@Forge need to check things properly, one day it works the next day it doesnt (Euleur a):

lllyasviel / stable-diffusion-webui-forge

Flux doesn't work on Macbook Pro M1 Max #1103