Closed pogevip closed 2 years ago
The same problem, can you please solve it
The problem still exists.
processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base", apply_ocr=False) It can run successfully,fail to load "microsoft/layoutlmv3-base-chinese", such as: processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base-chinese", apply_ocr=False)
TypeError: expected str, bytes or os.PathLike object, not NoneType
Version: 4.22.0.dev0 Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow Home-page: https://github.com/huggingface/transformers Author: The Hugging Face team (past and future) with the help of all our contributors
As far as I know, transformers doesn't support chinese layoultlmv3, but unilm is OK. https://github.com/microsoft/unilm/tree/master/layoutlmv3
As far as I know, transformers doesn't support chinese layoultlmv3, but unilm is OK. https://github.com/microsoft/unilm/tree/master/layoutlmv3
But I see it also requires vocab.json and merges.txt. I cannot load tokenizer either. https://github.com/microsoft/unilm/blob/master/layoutlmv3/layoutlmft/models/layoutlmv3/tokenization_layoutlmv3.py
How did you solve it, please?
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
none
Expected behavior
But we seem need vocab.json and merges.txt to load the LayoutLMv3Tokenizer . So could you provide a function to convert them or confirm whether there is a diff between these two tokenizers?