uclnlp / jack

Jack the Reader
MIT License
257 stars 82 forks source link

FastQA Throws InvalidArgumentError on ConcatOp #390

Open antonyscerri opened 6 years ago

antonyscerri commented 6 years ago

Hi

Happy to try and provide more details/data as required. The minimal details of the problem are that while training a FastQA model using Tensorflow with latest code as of 24th August. With some mixes of data we are getting the following error (stack trace below). We are using TensorFlow 1.10.0 due to the machine build, so it maybe some incompatibility.

We can run on other similar datasets without any problem. I've looked for empty inputs but nothing obvious has jumped out. We are inputting quite short questions with supporting content of about 2000 characters long. We have seen the error with much longer content. I wasn't sure if it was some mix of the words in the question or answer and the vocabulary (using Glove 6B word vecs) causing an empty tensor. It looks like it not so straightforward and its in the internal graph computation but i'd like to get some assistance or hear if anyone else has experienced anything similar.

I was able to reduce one example to two input records which i could put through a newly initialised model and using the prediction call not training which would cause the same problem. Yet if you pass the records in individually both would return a prediction without any error.

Thanks

Tony

Traceback (most recent call last): File ".../lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File ".../lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File ".../lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Expected concatenating dimensions in the range [0, 0), but got 0 [[Node: jtreader/fast_qa/cond_1/segment_top_k/concat = ConcatV2[N=2, T=DT_INT64, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](jtreader/fast_qa/cond_1/segment_top_k/Squeeze, jtreader/fast_qa/cond_1/segment_top_k/sub_1, jtreader/fast_qa/cond_2/GatherV2/axis)]]

dirkweissenborn commented 6 years ago

Hi, I think I know what the bug is but cannot fix it right now. If my suspicions are correct, for a quick fix you should make sure that the number of examples in the dataset should not leave rest 2 when dividing by the batch-size. So either you change the batchsize or change the size of the dataset slightly (e.g., add another example or remove one). Please let me know if this works.

antonyscerri commented 6 years ago

Hi

Thanks for that hint. I had a whole series of runs over various data splits and i can confirm that all the ones which failed had a modulo of 2 on the test set portion. It seems its ok to have a mod 2 in the training portion.

Thanks

Tony