RuntimeError: expected scalar type Float but found BFloat16

NineMeowICT commented 1 month ago

!!! Exception during processing!!! expected scalar type Float but found BFloat16
Traceback (most recent call last):
  File "/media/ninemeow/File/ComfyUI/custom_nodes/ComfyUI-SUPIR/sgm/modules/encoders/modules.py", line 585, in encode_with_transformer
    x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
  File "/media/ninemeow/File/ComfyUI/custom_nodes/ComfyUI-SUPIR/sgm/modules/encoders/modules.py", line 619, in text_transformer_forward
    x = r(x, attn_mask=attn_mask)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/open_clip/transformer.py", line 263, in forward
    x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask))
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/open_clip/transformer.py", line 20, in forward
    x = F.layer_norm(x.to(torch.float32), self.normalized_shape, self.weight, self.bias, self.eps)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found BFloat16

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/ninemeow/File/ComfyUI/execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/media/ninemeow/File/ComfyUI/execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "/media/ninemeow/File/ComfyUI/execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/media/ninemeow/File/ComfyUI/custom_nodes/ComfyUI-SUPIR/nodes_v2.py", line 637, in condition
    _c, _uc = SUPIR_model.conditioner.get_unconditional_conditioning(cond, uncond)
  File "/media/ninemeow/File/ComfyUI/custom_nodes/ComfyUI-SUPIR/sgm/modules/encoders/modules.py", line 190, in get_unconditional_conditioning
    c = self(batch_c)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/custom_nodes/ComfyUI-SUPIR/sgm/modules/encoders/modules.py", line 211, in forward
    emb_out = embedder(batch[embedder.input_key])
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/custom_nodes/ComfyUI-SUPIR/sgm/modules/encoders/modules.py", line 571, in forward
    z = self.encode_with_transformer(tokens.to(self.device))
  File "/media/ninemeow/File/ComfyUI/custom_nodes/ComfyUI-SUPIR/sgm/modules/encoders/modules.py", line 587, in encode_with_transformer
    x = self.text_transformer_forward_batch_first(x, attn_mask=self.model.attn_mask)
  File "/media/ninemeow/File/ComfyUI/custom_nodes/ComfyUI-SUPIR/sgm/modules/encoders/modules.py", line 635, in text_transformer_forward_batch_first
    x = r(x, attn_mask=attn_mask)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/open_clip/transformer.py", line 263, in forward
    x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask))
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/open_clip/transformer.py", line 20, in forward
    x = F.layer_norm(x.to(torch.float32), self.normalized_shape, self.weight, self.bias, self.eps)
  File "/media/ninemeow/File/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found BFloat16

I think it's quite strange, since to() function should convert the input correctly.

rwightman commented 1 month ago

@NineMeowICT the weights are probably bfloat16 because model was cast to lower precision LayerNormFp32 was not handled correctly (or disabled), there is code in open_clip to do cast to bfloat16/float16 correctly https://github.com/mlfoundations/open_clip/blob/fc5a37b72d705f760ebbc7915b84729816ed471f/src/open_clip/factory.py#L269-L291

https://github.com/mlfoundations/open_clip/blob/fc5a37b72d705f760ebbc7915b84729816ed471f/src/open_clip/model.py#L396-L426

This looks like a usage bug and note an OpenCLIP bug

NineMeowICT commented 1 month ago

@rwightman Okay, I will check my code again. Thank you.

mlfoundations / open_clip

RuntimeError: expected scalar type Float but found BFloat16 #916