Closed davidefiocco closed 5 years ago
Hi, I noticed that there may be a minor problem with the CoLA dataset.
By downloading data with the command python download_glue_data.py --data_dir glue_data --tasks CoLA
python download_glue_data.py --data_dir glue_data --tasks CoLA
I see that line 19 of dev.tsv reads
dev.tsv
"bc01 1 He could not] have been working."
and line 6998 of train.tsvreads
train.tsv
"sgww85 1 I consider that a rude remark and in very [NP and PP] bad taste."
The square brackets are not to be found in the original CoLA dataset https://nyu-mll.github.io/CoLA/
I am not sure of what may be the source of the discrepancy.
Aw, it seems that these are in the original too somehow, apologies!
Hi Davide, Thanks for pointing out the error. I will be releasing an updated version of CoLA with some corrections.
Hi, I noticed that there may be a minor problem with the CoLA dataset.
By downloading data with the command
python download_glue_data.py --data_dir glue_data --tasks CoLA
I see that line 19 of
dev.tsv
reads"bc01 1 He could not] have been working."
and line 6998 of
train.tsv
reads"sgww85 1 I consider that a rude remark and in very [NP and PP] bad taste."
The square brackets are not to be found in the original CoLA dataset https://nyu-mll.github.io/CoLA/
I am not sure of what may be the source of the discrepancy.