nyu-mll / GLUE-baselines

[DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations
https://gluebenchmark.com
739 stars 164 forks source link

Adding GLUE to PyTorch #5

Open PattynR opened 5 years ago

PattynR commented 5 years ago

Hi, I am currently adding some files into the PyTorch project that would enable it to directly import the GLUE datasets. I am however facing a problem regarding the QQP and SNLI datasets. There are some lines where there are too much tabs according to the number of columns that are mentioned in the first lines of those files. For example in the train.tsv file of QQP, line 97.931 is :

"\tWas Muhammad a real historical figure? What is the evidence for his existence?\t0

So in that line are supposed to be 3 columns while in the file there should 6 columns. How should I handle those lines?

Thank you.

sleepinyourhat commented 5 years ago

Hi P,

We have some notes on this issue here: https://groups.google.com/forum/#!topic/glue-benchmark-discuss/J5p3oTpqogY

Also, for a reference implementation of GLUE data loading/prediction writing, I'd look at jiant rather than this codebase: https://github.com/jsalt18-sentence-repl/jiant