LayoutLMv2 & LayoutXLM can not make inference with the Half (float16) dtype on CPU

Hi,

I wanted to make inference with LayoutXLMwith model parameters to Half(float16) dtype on CPU (I did try on GPU and it worked).

As I'm using Transformers from Hugging Face, I ran the following code:

from transformers import LayoutLMv2ForTokenClassification

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

param_dtype = torch.float16
model_id = "pierreguillou/layout-xlm-base-finetuned-with-DocLayNet-base-at-paragraphlevel-ml512"
model = LayoutLMv2ForTokenClassification.from_pretrained(model_id, torch_dtype=param_dtype)
model.to(device);

It worked but when I ran the model for inference with the following code, it failed:

with torch.no_grad():
    output = model(input_ids=input_id.to(device),
                   attention_mask=attention_mask.to(device),
                   bbox=bbox.to(device),
                   image=pixel_values.to(device)
     )

Error message:

[/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in layer_norm(input, normalized_shape, weight, bias, eps)
   2513             layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
   2514         )
-> 2515     return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
   2516 
   2517 

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

It looks like that dtype float32 is directly implemented in the LayoutLMv2 code.

How to solve this issue? Thanks.

microsoft / unilm

LayoutLMv2 & LayoutXLM can not make inference with the Half (float16) dtype on CPU #1081