yahoojapan / JGLUE

JGLUE: Japanese General Language Understanding Evaluation
Creative Commons Attribution Share Alike 4.0 International
305 stars 19 forks source link

DeBERTa models support #1

Open KoichiYasuoka opened 2 years ago

KoichiYasuoka commented 2 years ago

Thank you for releasing JGLUE, but I could not evaluate my deberta-base-japanese-aozora. There seem two problems exist:

I tried to force v4.19.2 for the problems, but I could not resolve the latter. Please see detail in my diary (written in Japanese). Do you have any idea?

KoichiYasuoka commented 2 years ago
!test -d transformers-4.19.2 || git clone -b v4.19.2 --depth=1 https://github.com/huggingface/transformers transformers-4.19.2
!test -d JGLUE || ( git clone --depth=1 https://github.com/yahoojapan/JGLUE && cat JGLUE/fine-tuning/patch/transformers-4.9.2_jglue-1.0.0.patch | ( cd transformers-4.19.2 && patch -p1 ) )
!cd transformers-4.19.2 && pip install .
!pip install -r transformers-4.19.2/examples/pytorch/text-classification/requirements.txt
!pip install protobuf==3.19.1 tensorboard
import json
for f in ["train-v1.0.json","valid-v1.0.json"]:
  with open("JGLUE/datasets/jsquad-v1.0/"+f,"r",encoding="utf-8") as r:
    j=json.load(r)
  u=[]
  for d in j["data"]:
    for p in d["paragraphs"]:
      for q in p["qas"]:
        u.append({"id":q["id"],"title":d["title"],"context":p["context"],"question":q["question"],"answers":{"text":[x["text"] for x in q["answers"]],"answer_start":[x["answer_start"] for x in q["answers"]]}})
  with open(f,"w",encoding="utf-8") as w:
    json.dump({"data":u},w,ensure_ascii=False,indent=2)
!python transformers-4.19.2/examples/pytorch/question-answering/run_qa.py --model_name_or_path KoichiYasuoka/deberta-base-japanese-aozora --do_train --do_eval --max_seq_length 384 --learning_rate 5e-05 --num_train_epochs 3 --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --output_dir ./output_jsquad2 --overwrite_output_dir --train_file train-v1.0.json --validation_file valid-v1.0.json --save_steps 5000 --warmup_ratio 0.1

I've just been trying the program above on Google Colaboratory, but I'm vague that the conversion is really suitable for JSQuAD. @tomohideshibata -san, does [SEP] in the jsquad-v1.0 files mean sep_token or not?

tomohideshibata commented 2 years ago

Thank you for trying JGLUE.

For the first comment, the latest version, v4.19.2, can work. (We have updated the explanation for the huggingface versions via https://github.com/yahoojapan/JGLUE/commit/53e5ecd9dfa7bbe6d84f818d599bfb4393dd639d.)

For the second comment, we used examples/legacy/question-answering/run_squad.py because examples/pytorch/question-answering/run_qa.py supports only fast tokenizers (BertJapaneseTokenizer does not have a fast version). We will check if run_qa.py works with JSQuAD.

Does [SEP] in the jsquad-v1.0 files mean sep_token or not?

Yes.

KoichiYasuoka commented 2 years ago

Thank you @tomohideshibata -san for confirming transformers v4.19.2. Here I realize that I need to replace [SEP] for another sep_token when I evaluate another model whose sep_token is not [SEP]. But... well... unless the sep_token consists of 5 characters, I should change answer_start, shoudn't I? Umm...

tomohideshibata commented 2 years ago

I should change answer_start, shoudn't I?

Yes. In the current version, sep_token is hard-coded in the dataset. One way to solve this problem is to calculate answer_start in the evaluation script given sep_token of a used tokenizer. We will try this in the next version.

KoichiYasuoka commented 2 years ago

Thank you @tomohideshibata -san for the information about [SEP]. Well, I've just made tentative https://github.com/KoichiYasuoka/JGLUE/blob/main/fine-tuning/patch/transformers-4.19.2_jglue-1.0.0.patch for transformers v4.19.2 where I included jsquad_metrics.py instead of changing original squad_metrics.py. But I couldn't include jsquad.py since I couldn't find the proper way to force [SEP] as sep_token in squad_convert_example_to_features() and its neighbors...

conan1024hao commented 2 years ago

We encountered a similar problem. examples/legacy/question-answering/run_squad.py does not fit fast tokenizers well, our model can not run on this script even with setting use_fast=False. So we tested examples/pytorch/question-answering/run_qa.py, multilingual models and waseda roberta can run on this well but tohoku berts' tokenizer does not support this. The result of nlp-waseda/roberta-base-japanese is as below(with out parameters optimizing), it seems to work fine as long as we can solve the tokenizer's problem.

EM F1
0.855 0.910
tomohideshibata commented 2 years ago

Thanks for reporting your results. We are also going to test run_qa.py.

kaisugi commented 2 years ago

I also tried run_qa.py (w/ trainer_qa.py & utils_qa.py) in transformers v4.19.2, but somehow an error occurred like this...

  File "run_qa.py", line 661, in <module>
    main()
  File "run_qa.py", line 337, in main
    answer_column_name = "answers" if "answers" in column_names else column_names[2]
IndexError: list index out of range
KoichiYasuoka commented 2 years ago

Hi @kaisugi -san, I needed some kind of conversion for run_qa.py. My tentative script on Google Colaboratory below:

!test -d transformers-4.19.2 || git clone -b v4.19.2 --depth=1 https://github.com/huggingface/transformers transformers-4.19.2
!test -d JGLUE || ( git clone --depth=1 https://github.com/yahoojapan/JGLUE && cat JGLUE/fine-tuning/patch/transformers-4.9.2_jglue-1.1.0.patch | ( cd transformers-4.19.2 && patch -p1 ) )
!cd transformers-4.19.2 && pip install .
!pip install -r transformers-4.19.2/examples/pytorch/text-classification/requirements.txt
!pip install protobuf==3.19.1 tensorboard
import json
for f in ["train-v1.1.json","valid-v1.1.json"]:
  with open("JGLUE/datasets/jsquad-v1.1/"+f,"r",encoding="utf-8") as r:
    j=json.load(r)
  u=[]
  for d in j["data"]:
    for p in d["paragraphs"]:
      for q in p["qas"]:
        u.append({"id":q["id"],"title":d["title"],"context":p["context"],"question":q["question"],"answers":{"text":[x["text"] for x in q["answers"]],"answer_start":[x["answer_start"] for x in q["answers"]]}})
  with open(f,"w",encoding="utf-8") as w:
    json.dump({"data":u},w,ensure_ascii=False,indent=2)
!python transformers-4.19.2/examples/pytorch/question-answering/run_qa.py --model_name_or_path KoichiYasuoka/deberta-base-japanese-aozora --do_train --do_eval --max_seq_length 384 --learning_rate 5e-05 --num_train_epochs 3 --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --output_dir ./output_jsquad2 --overwrite_output_dir --train_file train-v1.1.json --validation_file valid-v1.1.json --save_steps 5000 --warmup_ratio 0.1
kaisugi commented 2 years ago

@KoichiYasuoka

I confirmed your patch script worked properly, Thanks!