nyu-mll / GLUE-baselines

[DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations
https://gluebenchmark.com
739 stars 164 forks source link

The code doesn't work #11

Open kushalarora opened 5 years ago

kushalarora commented 5 years ago

The code in this repository is broken with multiple issues.

First the code has hard coded paths, this is unprofessional and I expected better from such a reputed lab especially with instutions like NYU, DeepMind and UW involved.

The path for downloading MRPC dataset from SentEval is broken. They seemed to have moved their data to a different URIs, namely MRPC_TRAIN = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt' MRPC_TEST = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt'

The command to run baseline is broken and it needs --eval_tasks to be passed else the code breaks as empty string is passed to task definition and a check their doesn't find the empty string in supported tasks.

Then half the code is migrated to QNLIV2 but dataset download part still download QNLI (V1?) hence the code breaks there.

Once I got passed this error, I encountered the following error. tr_generator = iterator(task.train_data, num_epochs=None, cuda_device=self._cuda_device)

Finally, the following error broke my spirits and I decided not to use GLUE benchmark for my experiments as despite importing the conda env with the package and having spent 3-4 hours getting the basic command from README to run, I just gave up as I am bit skeptical now about multiple hidden traps I might have to encounter fixing the code to get GLUE benchmark to run.

ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'

In case, there is a commit or version that I can run out of the box, please let me know. It will be a big help.

sleepinyourhat commented 5 years ago

Thanks for letting us know about all of this! (Though the opening strikes me as needlessly aggressive.)

Could you let us know what kind of experiments/models you're planning to run?

If you're just trying to evaluate a system on GLUE, you should just use the jiant codebase (as we say in big letters in the readme). That's where our ongoing, supported work on this project lives. This codebase only exists as an archive to allow people to reproduce our exact baseline numbers if they need to (it's basically an old internal draft of jiant). We will try to fix the clashes and broken links, though.

kushalarora commented 5 years ago

Hello Prof. Bowman,

I apologize if my comment came out as aggressive, it was more of a deep sigh of resignation at my attempt to reproduce the baseline than an accusatory comment. I understand how it will come out to appear aggressive though and apologize on behalf of sleep-deprived me writing this comment at 5 in the morning.

I am just planning to reproduce the baseline experiments. My set of experiments involve evaluating some word embeddings and for this task, the diagnostic test proposed by GLUE looked well suited. I read that comment about using jiant repo but all I was trying to do was swap out GloVe for something else and thought that running this repo might be simpler than running code from jiant repo.

I also understand that it is difficult to maintain a repo, especially in academic setting considering we don't have an army of engineers to support this effort but my comments stand. I will request you to kindly edit the Glue Benchmark site to point to jiant repo for running the baseline and as a main repo for the running benchmarks as the support effort is directed there. Otherwise, a few people at least will give up on using this extremely useful benchmark in their experiments due to their inability to reproduce the experiments.

Also, I will suggest adding a deprecated warning on the repo like https://github.com/knowitall/openie so that it is clear that we ought to use jiant repo directly. The current Readme indicates if you don't plan to substantially change code/models in the baseline, this repo should suffice. This is not the experience I had with the repo.

Finally, I also add a couple of issues I missed out on in my last comment.

  1. The package needs me to clone Cove package which ideally should be not the case if I don't want to run cove. We can do conditional imports in the code using the load_module command in python for such cases.
  2. The diagnostic tests were downloaded in a separate directory but the preprocessing code expected it in MNLI directory.

Once again apologies for my comment coming out as aggressive. If you like, I can help fix some or most of these issues via a pull request after the ACL deadline.

Regards, Kushal

sleepinyourhat commented 5 years ago

Thanks for the note! If you're far enough along that you're definitely going to try to use this code, PRs are welcome. We may beat you to it, but with the ACL deadline, it's not that likely.

Otherwise, though, jiant is still a work in progress, but it supports all the use cases that this repo does, and it's better documented and maintained. (CoVe is a conditional import there, IIRC, and the download script should be up to date.)

hughperkins commented 5 years ago

@kushalarora thank you for the information about how to fix the urls. works perfectly now :) for anyone else, if you get this error message:

Downloading and extracting CoLA...
        Completed!
Downloading and extracting SST...
        Completed!
Processing MRPC...
Traceback (most recent call last):
  File "download_glue_data.py", line 144, in <module>
    sys.exit(main(sys.argv[1:]))
  File "download_glue_data.py", line 136, in main
    format_mrpc(args.data_dir, args.path_to_mrpc)
  File "download_glue_data.py", line 68, in format_mrpc
    URLLIB.urlretrieve(MRPC_TRAIN, mrpc_train_file)
  File "/persist/conda/lib/python3.6/urllib/request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/persist/conda/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/persist/conda/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/persist/conda/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/persist/conda/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/persist/conda/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/persist/conda/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

... then open up download_glue_data.py, meander down to lines 45 and 46, and update them as per @kushalarora 's urls, in his first post.

Then you will see a healthier

Downloading and extracting CoLA...
        Completed!
Downloading and extracting SST...
        Completed!
Processing MRPC...
        Completed!
Downloading and extracting QQP...
        Completed!
Downloading and extracting STS...
        Completed!
Downloading and extracting MNLI...
        Completed!
Downloading and extracting SNLI...
        Completed!
Downloading and extracting QNLI...
        Completed!
Downloading and extracting RTE...
        Completed!
Downloading and extracting WNLI...
        Completed!
Downloading and extracting diagnostic...
        Completed!

:)

Sleepingbug commented 4 years ago

@kushalarora Thank you!

mingbocui commented 4 years ago

@kushalarora thanks for your sharing, helps a lot

YangQun1 commented 3 years ago

@kushalarora thanks for your sharing, helps a lot