Open cchance27 opened 2 days ago
Just checked the fp32 crash happens when it calls... emb = self.time_embedding(t_emb, timestep_cond)
at line 588 in custom_cogvideox_transformer_3d.py
whats odd is that i only see it passing in t_emb which is float32, and timestep_cond is none....
I am not sure if you insisted to run fp32 or not, but I have success with running kijai's 5b i2v and t2v models at bf16 on my macbook pro. This is the first time I could run cogvideo workflow on my mac since its release! :D
I am not sure if you insisted to run fp32 or not, but I have success with running kijai's 5b i2v and t2v models at bf16 on my macbook pro. This is the first time I could run cogvideo workflow on my mac since its release! :D
Could be because I just removed the autocast when using bf16 and fp16 too, figured it's only needed for fp8 and GGUF anymore.
Well I tried all 3 fp16 not supported Fp32 above crash Bf16 other above crash on mps
by all means up to trying things
this was all with the 1.5 gguf i2v model in the drop down
How about the other models, 1.5 does not work with fp16 on any hardware currently.
hadn't tried will try the i2v 5b gguf on the different dtypes and see if that works, since person above mentions they had it working on 5b (he didnt specify 1.5 or 1.0)...
i did a fresh pull of the repo just now and confirmed the 1.5 tested the same errors as above still, im waiting on the 5b to download HF being slow
hadn't tried will try the i2v 5b gguf on the different dtypes and see if that works, since person above mentions they had it working on 5b (he didnt specify 1.5 or 1.0)...
i did a fresh pull of the repo just now and confirmed the 1.5 tested the same errors as above still, im waiting on the 5b to download HF being slow
Another thing to try is the "comfy" attention mode that's now available, that's if you get past the temb part, comfy has set it up to be more compatible in general.
So on 5b_I2V_GGUF_Q4_0, i don't get the really badd mps.add crash that bombs python itself, but all 3 panic out of the sampler
fp16 is causing...
File "/Volumes/2TB/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 256, in forward
norm_hidden_states, norm_encoder_hidden_states, gate_msa, enc_gate_msa = self.norm1(
TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.
Not sure why its trying fp8 when set to fp16?
fp32 gives
File "/Volumes/2TB/AI/ComfyUI/venv/lib/python3.11/site-packages/diffusers/models/normalization.py", line 456, in forward
hidden_states = self.norm(hidden_states) * (1 + scale)[:, None, :] + shift[:, None, :]
^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Promotion for Float8 Types is not supported, attempted to promote Float8_e4m3fn and Float
bf16 gives
File "/Volumes/2TB/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 256, in forward
norm_hidden_states, norm_encoder_hidden_states, gate_msa, enc_gate_msa = self.norm1(
RuntimeError: Promotion for Float8 Types is not supported, attempted to promote Float8_e4m3fn and BFloat16
Another thing to try is the "comfy" attention mode that's now available, that's if you get past the temb part, comfy has set it up to be more compatible in general.
I'm using the down(load) gguf node, and seems you didnt add the attention modes to that one, only 2 sdpa and sage
These GGUF models actually use fp8 for some of the weights currently.
swapped to NON-gguf (down)load using THUDM/5b-i2v ... and it seems to be going on bf16 albeit ... SLOW [07:02<2:49:03, 422.66s/it] but it didnt crash
These GGUF models actually use fp8 for some of the weights currently.
Ok, Will avoid using the gguf, I guess since likely that fp8 will break things on macs, thats disappointing, let me get some other models downloaded on non-gguf and see if they work
Won't that cause issues for other gpu's as well that dont support fp8, or do nvidia just autocast it internally and mps doesnt
Fp8 is very much limited by hardware support in any case. I have also now added support for torchao quantization, but I have no clue if they support MPS at all.
HAHA don't think so sadly,
Also just confirmed, 1.5 i2v bf16 is also working on non-gguf version, so seems its the various gguf's blowing things up as non gguf 1.5 i get... 4%|███▎ | 1/25 [02:18<55:31, 138.83s/it]
But ya... even 1.5 is pretty damn slow still not 422s/it but 138s/it XD
Question... on 1.5 i see INFO:ComfyUI-CogVideoXWrapper.pipeline_cogvideox:Sampling 53 frames in 13 latent frames at 720x480 with 25 inference steps
on 1.0 it showed INFO:ComfyUI-CogVideoXWrapper.pipeline_cogvideox:Sampling 49 frames in 13 latent frames at 720x480 with 25 inference steps
How come 1.0 showed more frames than the ksampler was set to as i didnt change it from the default.
Fp8 is very much limited by hardware support in any case. I have also now added support for torchao quantization, but I have no clue if they support MPS at all.
Ya surprised to see the gguf with fp8 isn't the standard for Q4 ... Q4+FP16
HAHA don't think so sadly,
Also just confirmed, 1.5 i2v bf16 is also working on non-gguf version, so seems its the various gguf's blowing things up as non gguf 1.5 i get... 4%|███▎ | 1/25 [02:18<55:31, 138.83s/it]
But ya... even 1.5 is pretty damn slow still not 422s/it but 138s/it XD
Question... on 1.5 i see INFO:ComfyUI-CogVideoXWrapper.pipeline_cogvideox:Sampling 53 frames in 13 latent frames at 720x480 with 25 inference steps
on 1.0 it showed INFO:ComfyUI-CogVideoXWrapper.pipeline_cogvideox:Sampling 49 frames in 13 latent frames at 720x480 with 25 inference steps
How come 1.0 showed more frames than the ksampler was set to as i didnt change it from the default.
Fp8 is very much limited by hardware support in any case. I have also now added support for torchao quantization, but I have no clue if they support MPS at all.
Ya surprised to see the gguf with fp8 isn't the standard for Q4 ... Q4+FP16
With 1.5 the first latent is noisy, so it's padded and later removed. One latent has 4 frames as it's packed temporally too.
So with fp32... Sampler fails with... (using gguf i2v model) seems somewhere your doing a operation where element types don't match, (not in an autocast?), sadly its crashing out at the shadergraph so it doesn't tell me where in the pipeline its crashing out, i'll try to open comfy in vscode to see if i can step through to where its crashing later...
When set to bf16, get a different issue, seems that on macs it doesn't support linear non-float bias...which im guessing means that MPS doesn't support bfloat16 linear bias... for some reason