nyu-mll / GLUE-baselines

[DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations
https://gluebenchmark.com
739 stars 164 forks source link

Discrepancies with the original CoLA dataset #3

Closed davidefiocco closed 5 years ago

davidefiocco commented 5 years ago

Hi, I noticed that there may be a minor problem with the CoLA dataset.

By downloading data with the command python download_glue_data.py --data_dir glue_data --tasks CoLA

I see that line 19 of dev.tsv reads

"bc01 1 He could not] have been working."

and line 6998 of train.tsvreads

"sgww85 1 I consider that a rude remark and in very [NP and PP] bad taste."

The square brackets are not to be found in the original CoLA dataset https://nyu-mll.github.io/CoLA/

I am not sure of what may be the source of the discrepancy.

davidefiocco commented 5 years ago

Aw, it seems that these are in the original too somehow, apologies!

alexwarstadt commented 5 years ago

Hi Davide, Thanks for pointing out the error. I will be releasing an updated version of CoLA with some corrections.