Closed Katsumata420 closed 2 years ago
Hi, thanks for publishing JGLUE.
The dtype for the JSTS label column is a string dtype. https://github.com/yahoojapan/JGLUE/blob/53e5ecd9dfa7bbe6d84f818d599bfb4393dd639d/datasets/jsts-v1.0/valid-v1.0.json#L1 Why?
I think that run_glue.py determines if a task is a regression task or not by the dtype of the label column, so if it is a string dtype, it is treated as a classification task. https://github.com/huggingface/transformers/blob/v4.9.2/examples/pytorch/text-classification/run_glue.py
run_glue.py
In fact, fine-tuning BERT in JSTS resulted in a 26-value classification model. (I have patched run_glue.py.)
Thank you for reporting this issue. Yes, this is a bug. We are going to fix the JSTS label type (string -> float), and perform several experiments. Please wait the next release.
Thank you in advance.
Hi, thanks for publishing JGLUE.
The dtype for the JSTS label column is a string dtype. https://github.com/yahoojapan/JGLUE/blob/53e5ecd9dfa7bbe6d84f818d599bfb4393dd639d/datasets/jsts-v1.0/valid-v1.0.json#L1 Why?
I think that
run_glue.py
determines if a task is a regression task or not by the dtype of the label column, so if it is a string dtype, it is treated as a classification task. https://github.com/huggingface/transformers/blob/v4.9.2/examples/pytorch/text-classification/run_glue.pyIn fact, fine-tuning BERT in JSTS resulted in a 26-value classification model. (I have patched run_glue.py.)