huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.75k stars 26.45k forks source link

"config.json" does not include correct "id2label" and "label2id" after finetuning on NER task #2487

Closed lecidhugo closed 4 years ago

lecidhugo commented 4 years ago

🐛 Bug

Model I am using (Bert, XLNet....): xlmroberta

Language I am using the model on (English, Chinese....): English

The problem arise when using:

The script executed with no problems. But the file "config.json" in the output directory is not correct.

The tasks I am working on is:

To Reproduce

Steps to reproduce the behavior:

1. I run the script: run_ner.py as follows: python run_ner.py --data_dir 0-data --model_type 'xlmroberta' --model_name_or_path 'xlm-roberta-large' --output_dir 1-out --max_seq_length 32 --do_train --do_eval --per_gpu_train_batch_size 8 --no_cuda --evaluate_during_training --logging_steps 1756 --save_steps 1756 --eval_all_checkpoints

  1. Go to the output directory. The file "config.json" contains : "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, and "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, which are not expected in NER

Expected behavior

I expect that "config.json" contains something like: "id2label": { "0": "B-LOC", "1": "B-MISC", "2": "B-ORG", "3": "I-LOC", "4": "I-MISC", "5": "I-ORG", "6": "I-PER", "7": "O" }, and "label2id": { "B-LOC": 0, "B-MISC": 1, "B-ORG": 2, "I-LOC": 3, "I-MISC": 4, "I-ORG": 5, "I-PER": 6, "O": 7 },

Environment

Additional context

Lalalaashen commented 4 years ago

I checked the codes days before and 'label2id' and 'id2label' seemed not used and didn't influence the code execution.