yahoojapan / JGLUE

JGLUE: Japanese General Language Understanding Evaluation
Creative Commons Attribution Share Alike 4.0 International
302 stars 19 forks source link

The "label" column in the JSTS dataset is a string dtype #3

Closed Katsumata420 closed 1 year ago

Katsumata420 commented 2 years ago

Hi, thanks for publishing JGLUE.

The dtype for the JSTS label column is a string dtype. https://github.com/yahoojapan/JGLUE/blob/53e5ecd9dfa7bbe6d84f818d599bfb4393dd639d/datasets/jsts-v1.0/valid-v1.0.json#L1 Why?

I think that run_glue.py determines if a task is a regression task or not by the dtype of the label column, so if it is a string dtype, it is treated as a classification task. https://github.com/huggingface/transformers/blob/v4.9.2/examples/pytorch/text-classification/run_glue.py

In fact, fine-tuning BERT in JSTS resulted in a 26-value classification model. (I have patched run_glue.py.)

tomohideshibata commented 2 years ago

Thank you for reporting this issue. Yes, this is a bug. We are going to fix the JSTS label type (string -> float), and perform several experiments. Please wait the next release.

Thank you in advance.