microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.79k stars 2.52k forks source link

[markuplm] Unable to use with Huggingface #544

Open louis030195 opened 2 years ago

louis030195 commented 2 years ago

Describe the bug Model: markuplm

Screenshot 2021-11-27 at 09 38 39

The problem arises when using:

A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

pip install transformers
# Or main source code "git clone https://github.com/huggingface/transformers && cd transformers && pip install ."
from transformers import AutoTokenizer, MarkupLMForPretraining

tokenizer = AutoTokenizer.from_pretrained("microsoft/markuplm-large")

model = MarkupLMForPretraining.from_pretrained("microsoft/markuplm-large")

ValueError: Tokenizer class MarkupLMTokenizer does not exist or is not currently imported.

Expected behavior A clear and concise description of what you expected to happen. The tokenizer and model are properly loaded.

lockon-n commented 2 years ago

Now MarkupLM is not supported by the package transformers of huggingface, so you can only use it by downloading our source code. We will work on it to make MarkupLM appear on transformers soon.

NielsRogge commented 2 years ago

Hi,

I've added MarkupLM to Transformers here: https://github.com/NielsRogge/transformers/tree/modeling_markuplm/src/transformers/models/markuplm

However, I've not opened a PR yet, as I'd like to have a MarkupLProcessor (similar to LayoutLMv2Processor), that allows to prepare all data for the model (rather than only tokenizing text).

Feel free to work further on my branch.

lockon-n commented 2 years ago

@NielsRogge Thanks for adding MakupLM into the great transformers library! We have add a processor for MarkupLM like LayoutLMv2Processor as you required, and opened a PR under your branch. However this implementation is not so complete as we are not familiar with all the apis in transformers. We would appreciate it very much if you can kindly help us improve and officially release it.

wolfshow commented 2 years ago

@NielsRogge Any updates for adding MarkupLM to Transformers?

iamnafets commented 1 year ago

@NielsRogge you are amazing. Thank you for this!

NielsRogge commented 1 year ago

MarkupLM is now part of the Transformers library, feel free to close this issue :)