Closed jeswan closed 4 years ago
Comment by pep8speaks Thursday Apr 30, 2020 at 17:41 GMT
Hello @pyeres! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
jiant/utils/data_loaders.py
:Line 124:96: W291 trailing whitespace
You can repair most issues by installing black and running: black -l 100 ./*
. If you contribute often, have a look at the 'Contributing' section of the README for instructions on doing this automatically.
Comment by sleepinyourhat Thursday Apr 30, 2020 at 18:27 GMT
@W4ngatang - Could you look into the SNLI issue? It sounds like that's specific to the GLUE-downloader version that jiant
uses by default, and if that's easy to fix, it's probably better to fix it before we merge this PR.
Issue by pyeres Thursday Apr 30, 2020 at 17:41 GMT Originally opened as https://github.com/nyu-mll/jiant/pull/1083
This PR is motivated by issue #1082 (please see that issue for important context).
This PR would change the behavior of
load_tsv()
: if pandas finds a "bad line" in the data passed toload_tsv()
, apandas.errors.ParserError
will be thrown and the run will stop.How was this PR validated/tested? For all tasks where I had task data on hand (10+ tasks including SNLI), on this PR branch (i.e., w/
error_bad_lines=True
) I ran code to trigger a call toload_tsv()
, and for all tasks except SNLIload_tsv()
worked without an exception. For SNLI (where we know there's some inconsistency in the format of the data file) this (expected) exception is raised:What will this break: this change will cause SNLI task data loading to fail with exception like above until another fix (e.g., an upstream fix to rely on SNLI data without "bad lines") is applied.
pyeres included the following code: https://github.com/nyu-mll/jiant/pull/1083/commits