huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.19k stars 27.06k forks source link

LayoutXLM not loaded #12194

Closed tommasodelorenzo closed 3 years ago

tommasodelorenzo commented 3 years ago

I am trying to use layoutxlm model, but I get the following error when loading either the tokenizer or the model with respectively AutoTokenizer.from_pretrained("microsoft/layoutxlm-base") or AutoModelForTokenClassification.from_pretrained("microsoft/layoutxlm-base").

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-381e58ab16b7> in <module>
      1 model_name="microsoft/layoutxlm-base"
----> 2 tokenizer = AutoTokenizer.from_pretrained(model_name)

/opt/conda/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    400         kwargs["_from_auto"] = True
    401         if not isinstance(config, PretrainedConfig):
--> 402             config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
    403 
    404         use_fast = kwargs.pop("use_fast", True)

/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    430         config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
    431         if "model_type" in config_dict:
--> 432             config_class = CONFIG_MAPPING[config_dict["model_type"]]
    433             return config_class.from_dict(config_dict, **kwargs)
    434         else:

KeyError: 'layoutxlm'

Using transformers 4.6.1

NielsRogge commented 3 years ago

LayoutXLM is not yet supported by the AutoModel API. You can probably plug it into a LayoutLMTokenizer and a LayoutLMForTokenClassification.

tommasodelorenzo commented 3 years ago

I had already tried that, but did not work.

model_name="microsoft/layoutxlm-base"
tokenizer = LayoutLMTokenizer.from_pretrained(model_name)

Gives me the error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-720ed3162350> in <module>
      1 model_name="microsoft/layoutxlm-base"
----> 2 tokenizer = LayoutLMTokenizer.from_pretrained(model_name)

/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1717                 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
   1718 
-> 1719         return cls._from_pretrained(
   1720             resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
   1721         )

/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs)
   1789         # Instantiate tokenizer.
   1790         try:
-> 1791             tokenizer = cls(*init_inputs, **init_kwargs)
   1792         except OSError:
   1793             raise OSError(

/opt/conda/lib/python3.8/site-packages/transformers/models/bert/tokenization_bert.py in __init__(self, vocab_file, do_lower_case, do_basic_tokenize, never_split, unk_token, sep_token, pad_token, cls_token, mask_token, tokenize_chinese_chars, strip_accents, **kwargs)
    191         )
    192 
--> 193         if not os.path.isfile(vocab_file):
    194             raise ValueError(
    195                 f"Can't find a vocabulary file at path '{vocab_file}'. To load the vocabulary from a Google pretrained "

/opt/conda/lib/python3.8/genericpath.py in isfile(path)
     28     """Test whether a path is a regular file"""
     29     try:
---> 30         st = os.stat(path)
     31     except (OSError, ValueError):
     32         return False

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

While, model = LayoutLMModel.from_pretrained(model_name) does not throw an error, but it seems not able to correctly initialize weights.

You are using a model of type layoutxlm to instantiate a model of type layoutlm. This is not supported for all configurations of models and can yield errors.
Some weights of the model checkpoint at microsoft/layoutxlm-base were not used when initializing LayoutLMModel: ['layoutlmv2.visual.backbone.bottom_up.res4.1.conv3.norm.running_var', 'layoutlmv2.encoder.layer.7.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv3.norm.running_var', 'layoutlmv2.visual.backbone.fpn_output5.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv3.norm.weight', 'layoutlmv2.encoder.layer.3.output.LayerNorm.bias', 'layoutlmv2.encoder.layer.9.attention.output.LayerNorm.bias', 'layoutlmv2.encoder.layer.10.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv2.norm.running_mean', 'layoutlmv2.encoder.layer.3.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv3.norm.running_var', 'layoutlmv2.embeddings.position_embeddings.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv1.norm.bias', 'layoutlmv2.embeddings.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.stem.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.0.shortcut.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv1.weight', 'layoutlmv2.encoder.layer.5.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv1.norm.bias', 'layoutlmv2.visual.backbone.fpn_lateral3.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv1.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.6.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv3.norm.bias', 'layoutlmv2.encoder.layer.9.attention.self.value.weight', 'layoutlmv2.encoder.layer.5.attention.self.query.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv3.norm.running_mean', 'layoutlmv2.pooler.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.shortcut.norm.running_var', 'layoutlmv2.encoder.layer.9.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv3.norm.bias', 'layoutlmv2.encoder.layer.0.intermediate.dense.bias', 'layoutlmv2.encoder.layer.10.output.dense.bias', 'layoutlmv2.visual_proj.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv2.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.2.attention.output.LayerNorm.weight', 'layoutlmv2.encoder.layer.0.attention.self.query.bias', 'layoutlmv2.encoder.layer.11.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv2.norm.weight', 'layoutlmv2.encoder.layer.5.intermediate.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.5.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv2.norm.running_mean', 'layoutlmv2.embeddings.h_position_embeddings.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv1.weight', 'layoutlmv2.encoder.layer.10.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.3.attention.self.query.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv1.weight', 'layoutlmv2.encoder.layer.6.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res2.0.shortcut.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv2.norm.running_var', 'layoutlmv2.encoder.layer.9.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv3.norm.running_mean', 'layoutlmv2.encoder.layer.8.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv1.weight', 'layoutlmv2.encoder.layer.3.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv2.norm.bias', 'layoutlmv2.encoder.layer.9.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv2.norm.weight', 'layoutlmv2.visual.pixel_mean', 'layoutlmv2.encoder.layer.9.output.LayerNorm.weight', 'layoutlmv2.encoder.layer.3.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv3.norm.bias', 'layoutlmv2.visual.backbone.fpn_output2.weight', 'layoutlmv2.encoder.layer.8.attention.self.query.weight', 'layoutlmv2.encoder.layer.2.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv1.norm.bias', 'layoutlmv2.encoder.layer.0.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv3.norm.weight', 'layoutlmv2.encoder.layer.11.attention.self.key.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv1.norm.weight', 'layoutlmv2.encoder.layer.9.attention.self.value.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv3.weight', 'layoutlmv2.encoder.layer.6.attention.self.key.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv2.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.5.attention.self.value.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.shortcut.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv2.norm.running_mean', 'layoutlmv2.encoder.layer.4.attention.self.query.bias', 'layoutlmv2.encoder.layer.2.attention.output.LayerNorm.bias', 'layoutlmv2.encoder.layer.11.attention.self.key.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv3.norm.bias', 'layoutlmv2.encoder.layer.1.attention.self.key.weight', 'layoutlmv2.encoder.layer.3.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv1.norm.weight', 'layoutlmv2.encoder.layer.8.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv1.norm.weight', 'layoutlmv2.encoder.layer.0.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv1.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.4.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.fpn_output3.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv3.norm.running_var', 'layoutlmv2.encoder.layer.6.attention.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv1.norm.running_mean', 'layoutlmv2.encoder.layer.11.attention.self.query.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv1.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.4.attention.self.key.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv3.weight', 'layoutlmv2.encoder.layer.6.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv2.weight', 'layoutlmv2.visual.backbone.fpn_lateral2.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv3.norm.bias', 'layoutlmv2.encoder.layer.11.attention.self.value.bias', 'layoutlmv2.visual.backbone.fpn_output3.weight', 'layoutlmv2.encoder.layer.6.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv1.norm.weight', 'layoutlmv2.encoder.layer.7.attention.self.value.bias', 'layoutlmv2.encoder.layer.2.intermediate.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv1.norm.weight', 'layoutlmv2.encoder.layer.7.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.8.attention.self.key.weight', 'layoutlmv2.encoder.layer.10.attention.self.value.weight', 'layoutlmv2.encoder.layer.1.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv2.norm.bias', 'layoutlmv2.encoder.layer.5.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv1.norm.running_var', 'layoutlmv2.encoder.layer.3.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv1.norm.running_var', 'layoutlmv2.encoder.layer.4.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv3.weight', 'layoutlmv2.encoder.layer.4.attention.self.query.weight', 'layoutlmv2.encoder.layer.8.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.0.shortcut.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.0.shortcut.weight', 'layoutlmv2.encoder.layer.9.attention.self.key.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv3.norm.running_mean', 'layoutlmv2.encoder.layer.2.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv1.norm.weight', 'layoutlmv2.encoder.layer.10.attention.self.key.bias', 'layoutlmv2.encoder.layer.9.attention.self.key.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv2.weight', 'layoutlmv2.encoder.layer.1.intermediate.dense.bias', 'layoutlmv2.embeddings.x_position_embeddings.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv3.norm.running_mean', 'layoutlmv2.encoder.layer.10.attention.output.LayerNorm.bias', 'layoutlmv2.encoder.layer.0.attention.self.query.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.0.shortcut.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.0.shortcut.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv1.norm.bias', 'layoutlmv2.encoder.layer.0.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv1.weight', 'layoutlmv2.encoder.layer.4.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv3.norm.running_mean', 'layoutlmv2.encoder.layer.2.attention.self.query.weight', 'layoutlmv2.encoder.layer.8.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv1.weight', 'layoutlmv2.visual.backbone.fpn_lateral5.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv2.norm.running_var', 'layoutlmv2.encoder.layer.0.output.dense.weight', 'layoutlmv2.encoder.layer.2.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv3.weight', 'layoutlmv2.encoder.layer.6.attention.self.key.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv1.norm.running_mean', 'layoutlmv2.encoder.layer.6.attention.self.query.weight', 'layoutlmv2.encoder.layer.3.attention.self.key.weight', 'layoutlmv2.visual_proj.bias', 'layoutlmv2.encoder.layer.10.attention.self.query.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv1.norm.weight', 'layoutlmv2.embeddings.position_ids', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.6.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.shortcut.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv2.weight', 'layoutlmv2.encoder.layer.4.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv2.norm.bias', 'layoutlmv2.encoder.layer.2.attention.self.key.weight', 'layoutlmv2.encoder.layer.10.attention.self.key.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.0.shortcut.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.stem.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv2.norm.running_mean', 'layoutlmv2.encoder.layer.5.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv3.norm.bias', 'layoutlmv2.encoder.layer.9.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.9.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv3.weight', 'layoutlmv2.encoder.layer.0.output.dense.bias', 'layoutlmv2.encoder.layer.1.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv3.norm.running_mean', 'layoutlmv2.encoder.layer.11.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.8.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.0.shortcut.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.1.attention.self.key.bias', 'layoutlmv2.encoder.layer.7.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv1.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.10.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv2.norm.running_var', 'layoutlmv2.visual.backbone.fpn_output2.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv3.norm.weight', 'layoutlmv2.encoder.layer.8.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.0.shortcut.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv1.norm.running_var', 'layoutlmv2.encoder.layer.0.attention.self.key.bias', 'layoutlmv2.visual.backbone.bottom_up.stem.conv1.weight', 'layoutlmv2.encoder.layer.6.intermediate.dense.weight', 'layoutlmv2.encoder.layer.8.attention.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv3.norm.weight', 'layoutlmv2.encoder.layer.6.attention.self.value.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.stem.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv1.norm.running_var', 'layoutlmv2.encoder.layer.11.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv2.norm.running_mean', 'layoutlmv2.encoder.layer.11.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv1.weight', 'layoutlmv2.encoder.layer.8.attention.self.key.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv1.weight', 'layoutlmv2.encoder.layer.9.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.0.shortcut.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv3.weight', 'layoutlmv2.visual.backbone.fpn_lateral5.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv3.norm.bias', 'layoutlmv2.encoder.layer.11.attention.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv1.weight', 'layoutlmv2.encoder.layer.7.attention.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv3.norm.bias', 'layoutlmv2.encoder.layer.9.intermediate.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv3.norm.bias', 'layoutlmv2.encoder.layer.7.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv2.norm.weight', 'layoutlmv2.visual.backbone.fpn_lateral4.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv2.weight', 'layoutlmv2.visual.backbone.fpn_output4.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv2.weight', 'layoutlmv2.encoder.layer.7.attention.self.query.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv2.norm.weight', 'layoutlmv2.encoder.layer.4.attention.self.value.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv1.weight', 'layoutlmv2.encoder.layer.2.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv3.weight', 'layoutlmv2.encoder.layer.2.attention.self.value.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv1.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.0.attention.output.LayerNorm.bias', 'layoutlmv2.encoder.layer.10.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv2.norm.running_mean', 'layoutlmv2.encoder.layer.0.attention.output.dense.bias', 'layoutlmv2.pooler.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv3.weight', 'layoutlmv2.encoder.layer.8.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv1.norm.running_var', 'layoutlmv2.encoder.layer.11.output.dense.weight', 'layoutlmv2.embeddings.word_embeddings.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv1.norm.running_var', 'layoutlmv2.embeddings.y_position_embeddings.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.stem.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv2.norm.running_var', 'layoutlmv2.encoder.layer.5.attention.self.value.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.fpn_output4.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv2.norm.num_batches_tracked', 'layoutlmv2.visual_LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv2.norm.bias', 'layoutlmv2.encoder.layer.4.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv3.norm.weight', 'layoutlmv2.visual_LayerNorm.weight', 'layoutlmv2.encoder.layer.11.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv2.norm.running_mean', 'layoutlmv2.encoder.layer.8.attention.self.value.weight', 'layoutlmv2.encoder.layer.10.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.0.shortcut.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv1.weight', 'layoutlmv2.encoder.layer.6.attention.self.value.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv3.norm.running_mean', 'layoutlmv2.encoder.layer.10.attention.self.value.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.shortcut.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.0.shortcut.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.7.attention.self.key.bias', 'layoutlmv2.encoder.layer.1.attention.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.0.shortcut.weight', 'layoutlmv2.encoder.layer.4.attention.self.key.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv3.norm.running_var', 'layoutlmv2.encoder.layer.9.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv1.norm.weight', 'layoutlmv2.encoder.layer.7.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.stem.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.0.shortcut.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv1.norm.running_mean', 'layoutlmv2.encoder.layer.3.output.dense.weight', 'layoutlmv2.encoder.layer.6.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv2.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.1.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv2.weight', 'layoutlmv2.encoder.layer.4.attention.output.dense.weight', 'layoutlmv2.encoder.layer.8.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv3.norm.running_var', 'layoutlmv2.encoder.layer.0.attention.output.LayerNorm.weight', 'layoutlmv2.encoder.layer.11.output.LayerNorm.weight', 'layoutlmv2.encoder.layer.5.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.2.attention.self.value.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv3.norm.weight', 'layoutlmv2.encoder.layer.4.intermediate.dense.bias', 'layoutlmv2.encoder.layer.6.intermediate.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv1.norm.running_var', 'layoutlmv2.encoder.layer.3.output.LayerNorm.weight', 'layoutlmv2.encoder.layer.7.output.dense.bias', 'layoutlmv2.encoder.layer.10.intermediate.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv2.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.7.attention.self.key.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv2.norm.weight', 'layoutlmv2.encoder.layer.0.attention.self.value.bias', 'layoutlmv2.visual.pixel_std', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv3.norm.running_var', 'layoutlmv2.encoder.layer.1.output.dense.bias', 'layoutlmv2.encoder.layer.5.attention.self.key.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv3.weight', 'layoutlmv2.embeddings.token_type_embeddings.weight', 'layoutlmv2.encoder.layer.7.intermediate.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.2.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv1.norm.weight', 'layoutlmv2.encoder.layer.4.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv3.norm.bias', 'layoutlmv2.encoder.layer.7.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.fpn_lateral3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv2.weight', 'layoutlmv2.encoder.layer.1.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv1.norm.running_var', 'layoutlmv2.encoder.layer.5.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.9.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv3.norm.running_var', 'layoutlmv2.encoder.layer.4.attention.self.value.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv1.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.1.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv1.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.11.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv3.norm.running_mean', 'layoutlmv2.encoder.layer.5.output.dense.bias', 'layoutlmv2.encoder.layer.9.attention.self.query.weight', 'layoutlmv2.visual_segment_embedding', 'layoutlmv2.encoder.layer.1.attention.self.query.weight', 'layoutlmv2.encoder.layer.10.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv2.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.11.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv1.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.6.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.shortcut.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv2.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.2.attention.self.key.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv1.norm.weight', 'layoutlmv2.encoder.layer.2.output.dense.bias', 'layoutlmv2.encoder.layer.5.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv3.norm.weight', 'layoutlmv2.encoder.layer.1.output.LayerNorm.bias', 'layoutlmv2.encoder.layer.7.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv2.norm.bias', 'layoutlmv2.embeddings.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.2.conv2.weight', 'layoutlmv2.encoder.layer.0.attention.self.value.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv2.norm.bias', 'layoutlmv2.encoder.layer.0.attention.self.key.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.21.conv3.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.0.shortcut.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv1.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.11.intermediate.dense.bias', 'layoutlmv2.encoder.layer.3.intermediate.dense.bias', 'layoutlmv2.encoder.layer.2.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.2.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv3.norm.bias', 'layoutlmv2.encoder.layer.10.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res2.0.shortcut.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv2.norm.running_var', 'layoutlmv2.encoder.layer.1.attention.self.value.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv2.norm.bias', 'layoutlmv2.encoder.layer.9.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv2.norm.running_var', 'layoutlmv2.encoder.layer.8.attention.self.value.bias', 'layoutlmv2.encoder.layer.4.attention.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv2.norm.weight', 'layoutlmv2.visual.backbone.fpn_output5.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.1.conv1.weight', 'layoutlmv2.encoder.layer.1.intermediate.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.12.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.fpn_lateral4.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv2.norm.running_var', 'layoutlmv2.encoder.layer.4.attention.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.1.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv3.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv3.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv2.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.20.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.13.conv2.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv3.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.5.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res5.0.shortcut.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.6.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv3.weight', 'layoutlmv2.encoder.layer.7.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv3.norm.num_batches_tracked', 'layoutlmv2.embeddings.w_position_embeddings.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.1.conv2.weight', 'layoutlmv2.encoder.layer.1.attention.self.value.bias', 'layoutlmv2.encoder.layer.11.attention.self.value.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.17.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res5.0.shortcut.norm.running_mean', 'layoutlmv2.visual.backbone.bottom_up.res4.18.conv3.norm.running_var', 'layoutlmv2.encoder.layer.2.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.1.conv2.norm.num_batches_tracked', 'layoutlmv2.encoder.layer.8.intermediate.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.3.conv3.norm.weight', 'layoutlmv2.encoder.layer.2.intermediate.dense.weight', 'layoutlmv2.encoder.layer.0.attention.output.dense.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv1.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.14.conv1.norm.bias', 'layoutlmv2.encoder.layer.3.attention.self.value.bias', 'layoutlmv2.encoder.layer.5.attention.self.key.weight', 'layoutlmv2.visual.backbone.fpn_lateral2.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv3.norm.running_mean', 'layoutlmv2.encoder.layer.1.attention.self.query.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv1.norm.running_var', 'layoutlmv2.encoder.layer.3.attention.output.LayerNorm.bias', 'layoutlmv2.encoder.layer.3.attention.self.value.weight', 'layoutlmv2.visual.backbone.bottom_up.res3.0.conv3.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv2.weight', 'layoutlmv2.encoder.layer.6.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res2.0.conv1.norm.running_mean', 'layoutlmv2.encoder.layer.8.attention.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.3.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res5.0.conv1.weight', 'layoutlmv2.visual.backbone.bottom_up.res5.0.shortcut.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.5.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv3.norm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.22.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.7.conv1.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv1.norm.bias', 'layoutlmv2.encoder.layer.7.attention.self.value.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv2.norm.bias', 'layoutlmv2.encoder.layer.3.output.dense.bias', 'layoutlmv2.visual.backbone.bottom_up.res3.2.conv2.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.19.conv2.norm.running_var', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv1.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.11.conv1.norm.bias', 'layoutlmv2.encoder.layer.10.output.LayerNorm.weight', 'layoutlmv2.visual.backbone.bottom_up.res4.16.conv2.norm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.15.conv3.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.8.conv2.norm.weight', 'layoutlmv2.encoder.layer.3.attention.self.key.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.10.conv2.weight', 'layoutlmv2.encoder.layer.5.attention.output.LayerNorm.bias', 'layoutlmv2.visual.backbone.bottom_up.res4.0.conv1.norm.num_batches_tracked', 'layoutlmv2.visual.backbone.bottom_up.res4.4.conv2.norm.running_mean']
- This IS expected if you are initializing LayoutLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LayoutLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LayoutLMModel were not initialized from the model checkpoint at microsoft/layoutxlm-base and are newly initialized: ['layoutlm.encoder.layer.4.intermediate.dense.weight', 'layoutlm.encoder.layer.4.intermediate.dense.bias', 'layoutlm.encoder.layer.10.attention.self.query.weight', 'layoutlm.encoder.layer.6.output.LayerNorm.bias', 'layoutlm.encoder.layer.2.attention.self.value.bias', 'layoutlm.encoder.layer.2.intermediate.dense.bias', 'layoutlm.encoder.layer.5.output.dense.bias', 'layoutlm.encoder.layer.10.output.LayerNorm.bias', 'layoutlm.encoder.layer.9.output.dense.weight', 'layoutlm.encoder.layer.0.attention.self.value.weight', 'layoutlm.encoder.layer.7.attention.self.key.bias', 'layoutlm.encoder.layer.2.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.1.attention.self.key.weight', 'layoutlm.encoder.layer.8.output.dense.bias', 'layoutlm.encoder.layer.1.intermediate.dense.weight', 'layoutlm.encoder.layer.6.attention.self.query.bias', 'layoutlm.encoder.layer.11.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.3.attention.self.query.bias', 'layoutlm.encoder.layer.1.attention.self.value.bias', 'layoutlm.encoder.layer.7.output.dense.bias', 'layoutlm.encoder.layer.5.attention.output.dense.bias', 'layoutlm.encoder.layer.9.attention.output.dense.bias', 'layoutlm.encoder.layer.9.attention.self.key.bias', 'layoutlm.encoder.layer.9.output.dense.bias', 'layoutlm.encoder.layer.5.output.LayerNorm.bias', 'layoutlm.encoder.layer.6.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.6.output.dense.bias', 'layoutlm.pooler.dense.weight', 'layoutlm.encoder.layer.1.attention.output.dense.bias', 'layoutlm.encoder.layer.7.output.LayerNorm.bias', 'layoutlm.encoder.layer.2.attention.self.key.bias', 'layoutlm.encoder.layer.1.attention.self.value.weight', 'layoutlm.encoder.layer.3.attention.output.dense.weight', 'layoutlm.encoder.layer.10.intermediate.dense.weight', 'layoutlm.encoder.layer.6.attention.self.query.weight', 'layoutlm.encoder.layer.6.attention.output.dense.weight', 'layoutlm.encoder.layer.7.intermediate.dense.bias', 'layoutlm.encoder.layer.3.intermediate.dense.weight', 'layoutlm.encoder.layer.7.attention.self.value.weight', 'layoutlm.encoder.layer.8.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.6.attention.self.key.weight', 'layoutlm.encoder.layer.1.output.dense.bias', 'layoutlm.encoder.layer.3.attention.self.key.weight', 'layoutlm.encoder.layer.5.output.LayerNorm.weight', 'layoutlm.encoder.layer.1.attention.self.query.bias', 'layoutlm.encoder.layer.11.attention.output.dense.weight', 'layoutlm.encoder.layer.10.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.0.attention.self.query.weight', 'layoutlm.encoder.layer.11.attention.self.key.bias', 'layoutlm.encoder.layer.2.attention.self.query.weight', 'layoutlm.encoder.layer.11.attention.self.query.weight', 'layoutlm.encoder.layer.7.attention.self.key.weight', 'layoutlm.encoder.layer.10.output.dense.weight', 'layoutlm.encoder.layer.0.attention.self.key.bias', 'layoutlm.encoder.layer.7.attention.self.query.bias', 'layoutlm.encoder.layer.7.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.9.intermediate.dense.bias', 'layoutlm.encoder.layer.1.attention.self.key.bias', 'layoutlm.encoder.layer.1.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.8.attention.self.value.weight', 'layoutlm.encoder.layer.7.attention.self.value.bias', 'layoutlm.encoder.layer.8.attention.self.key.bias', 'layoutlm.encoder.layer.5.attention.self.query.bias', 'layoutlm.encoder.layer.11.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.10.attention.self.key.weight', 'layoutlm.encoder.layer.2.attention.self.value.weight', 'layoutlm.encoder.layer.11.output.LayerNorm.bias', 'layoutlm.encoder.layer.10.attention.self.value.weight', 'layoutlm.encoder.layer.1.intermediate.dense.bias', 'layoutlm.encoder.layer.2.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.4.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.0.output.dense.weight', 'layoutlm.encoder.layer.4.output.LayerNorm.weight', 'layoutlm.encoder.layer.7.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.3.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.10.output.LayerNorm.weight', 'layoutlm.encoder.layer.3.attention.self.value.weight', 'layoutlm.embeddings.x_position_embeddings.weight', 'layoutlm.encoder.layer.11.attention.self.value.weight', 'layoutlm.encoder.layer.5.intermediate.dense.bias', 'layoutlm.encoder.layer.4.attention.self.query.bias', 'layoutlm.embeddings.word_embeddings.weight', 'layoutlm.encoder.layer.7.attention.self.query.weight', 'layoutlm.encoder.layer.6.output.dense.weight', 'layoutlm.encoder.layer.11.output.dense.bias', 'layoutlm.encoder.layer.2.intermediate.dense.weight', 'layoutlm.encoder.layer.8.attention.self.key.weight', 'layoutlm.encoder.layer.5.output.dense.weight', 'layoutlm.encoder.layer.6.attention.self.value.bias', 'layoutlm.encoder.layer.2.output.LayerNorm.weight', 'layoutlm.encoder.layer.9.attention.output.dense.weight', 'layoutlm.encoder.layer.3.output.dense.weight', 'layoutlm.encoder.layer.5.attention.self.value.weight', 'layoutlm.encoder.layer.9.attention.self.value.weight', 'layoutlm.encoder.layer.1.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.8.output.dense.weight', 'layoutlm.encoder.layer.1.output.LayerNorm.bias', 'layoutlm.encoder.layer.6.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.8.attention.self.value.bias', 'layoutlm.encoder.layer.2.attention.self.query.bias', 'layoutlm.encoder.layer.2.output.dense.bias', 'layoutlm.encoder.layer.4.attention.output.LayerNorm.bias', 'layoutlm.embeddings.h_position_embeddings.weight', 'layoutlm.encoder.layer.0.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.11.output.LayerNorm.weight', 'layoutlm.encoder.layer.5.attention.self.key.weight', 'layoutlm.encoder.layer.7.attention.output.dense.bias', 'layoutlm.encoder.layer.6.intermediate.dense.weight', 'layoutlm.embeddings.token_type_embeddings.weight', 'layoutlm.encoder.layer.11.attention.output.dense.bias', 'layoutlm.encoder.layer.9.attention.self.key.weight', 'layoutlm.encoder.layer.8.output.LayerNorm.weight', 'layoutlm.encoder.layer.6.output.LayerNorm.weight', 'layoutlm.encoder.layer.10.attention.output.dense.bias', 'layoutlm.encoder.layer.7.intermediate.dense.weight', 'layoutlm.encoder.layer.9.output.LayerNorm.weight', 'layoutlm.encoder.layer.0.output.LayerNorm.bias', 'layoutlm.encoder.layer.7.output.LayerNorm.weight', 'layoutlm.encoder.layer.8.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.9.attention.self.query.bias', 'layoutlm.encoder.layer.0.output.LayerNorm.weight', 'layoutlm.encoder.layer.3.intermediate.dense.bias', 'layoutlm.encoder.layer.4.attention.self.key.bias', 'layoutlm.encoder.layer.5.attention.self.query.weight', 'layoutlm.encoder.layer.2.output.dense.weight', 'layoutlm.encoder.layer.5.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.5.intermediate.dense.weight', 'layoutlm.encoder.layer.5.attention.self.key.bias', 'layoutlm.encoder.layer.8.attention.output.dense.bias', 'layoutlm.encoder.layer.10.output.dense.bias', 'layoutlm.encoder.layer.9.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.4.attention.self.value.weight', 'layoutlm.encoder.layer.1.output.dense.weight', 'layoutlm.encoder.layer.4.output.dense.weight', 'layoutlm.encoder.layer.8.attention.output.dense.weight', 'layoutlm.encoder.layer.3.output.LayerNorm.bias', 'layoutlm.encoder.layer.11.intermediate.dense.weight', 'layoutlm.encoder.layer.4.output.LayerNorm.bias', 'layoutlm.encoder.layer.11.output.dense.weight', 'layoutlm.encoder.layer.7.attention.output.dense.weight', 'layoutlm.embeddings.LayerNorm.weight', 'layoutlm.encoder.layer.2.attention.self.key.weight', 'layoutlm.encoder.layer.11.intermediate.dense.bias', 'layoutlm.encoder.layer.2.output.LayerNorm.bias', 'layoutlm.encoder.layer.3.output.dense.bias', 'layoutlm.encoder.layer.3.attention.self.key.bias', 'layoutlm.encoder.layer.8.attention.self.query.bias', 'layoutlm.encoder.layer.1.output.LayerNorm.weight', 'layoutlm.embeddings.w_position_embeddings.weight', 'layoutlm.encoder.layer.9.intermediate.dense.weight', 'layoutlm.encoder.layer.10.attention.self.value.bias', 'layoutlm.encoder.layer.8.attention.self.query.weight', 'layoutlm.encoder.layer.9.attention.self.value.bias', 'layoutlm.encoder.layer.4.attention.output.dense.bias', 'layoutlm.encoder.layer.11.attention.self.query.bias', 'layoutlm.encoder.layer.5.attention.output.dense.weight', 'layoutlm.encoder.layer.0.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.11.attention.self.value.bias', 'layoutlm.encoder.layer.0.output.dense.bias', 'layoutlm.encoder.layer.3.attention.output.dense.bias', 'layoutlm.encoder.layer.9.attention.self.query.weight', 'layoutlm.encoder.layer.3.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.3.attention.self.query.weight', 'layoutlm.embeddings.LayerNorm.bias', 'layoutlm.encoder.layer.10.attention.self.key.bias', 'layoutlm.encoder.layer.2.attention.output.dense.bias', 'layoutlm.encoder.layer.6.intermediate.dense.bias', 'layoutlm.encoder.layer.7.output.dense.weight', 'layoutlm.encoder.layer.10.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.9.attention.output.LayerNorm.bias', 'layoutlm.encoder.layer.0.attention.self.query.bias', 'layoutlm.encoder.layer.2.attention.output.dense.weight', 'layoutlm.embeddings.position_embeddings.weight', 'layoutlm.encoder.layer.10.attention.self.query.bias', 'layoutlm.encoder.layer.4.attention.self.key.weight', 'layoutlm.encoder.layer.5.attention.output.LayerNorm.weight', 'layoutlm.encoder.layer.0.attention.self.key.weight', 'layoutlm.encoder.layer.8.intermediate.dense.weight', 'layoutlm.encoder.layer.0.attention.output.dense.weight', 'layoutlm.encoder.layer.3.attention.self.value.bias', 'layoutlm.encoder.layer.10.intermediate.dense.bias', 'layoutlm.encoder.layer.11.attention.self.key.weight', 'layoutlm.encoder.layer.4.output.dense.bias', 'layoutlm.encoder.layer.3.output.LayerNorm.weight', 'layoutlm.encoder.layer.0.intermediate.dense.bias', 'layoutlm.encoder.layer.5.attention.self.value.bias', 'layoutlm.encoder.layer.0.attention.output.dense.bias', 'layoutlm.pooler.dense.bias', 'layoutlm.encoder.layer.6.attention.self.value.weight', 'layoutlm.encoder.layer.0.attention.self.value.bias', 'layoutlm.encoder.layer.6.attention.self.key.bias', 'layoutlm.encoder.layer.1.attention.output.dense.weight', 'layoutlm.encoder.layer.4.attention.self.value.bias', 'layoutlm.encoder.layer.6.attention.output.dense.bias', 'layoutlm.embeddings.y_position_embeddings.weight', 'layoutlm.encoder.layer.9.output.LayerNorm.bias', 'layoutlm.encoder.layer.4.attention.output.dense.weight', 'layoutlm.encoder.layer.10.attention.output.dense.weight', 'layoutlm.encoder.layer.1.attention.self.query.weight', 'layoutlm.encoder.layer.8.output.LayerNorm.bias', 'layoutlm.encoder.layer.0.intermediate.dense.weight', 'layoutlm.encoder.layer.4.attention.self.query.weight', 'layoutlm.encoder.layer.8.intermediate.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
NielsRogge commented 3 years ago

Hmm ok, I thought LayoutXLM was equivalent to LayoutLM, but apparently it isn't. I guess one would need to add LayoutXLM to HuggingFace Transformers in order to properly load it. Otherwise, you can use the newly released layoutlmft package by the original authors as explained here.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.