seominjoon / qrn

Query-Reduction Networks (QRN)
http://uwnlp.github.io/qrn/
MIT License
138 stars 30 forks source link

InvalidArgumentError: Received a label value of 292 which is outside the valid range of [0, 10) #12

Open jatinganhotra opened 5 years ago

jatinganhotra commented 5 years ago

I get the following error when training the model for bAbI dialog task 5. The command line args used are: python dialog/main.py --load=False --task 5 --num_epochs 2 --data_dir "data/dialog-babi-tasks" --val_period 1 --save_period 1 --train=True --draft=True

The exact error is:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 292 which is outside the valid range of [0, 10).  Label values: 0 0 0 0 0 0 0 4 227 292 0 0 0 4 0 0 0 7 0 1 0 32 9 0 0 0 0 0 0 0 0 0
[[Node: towers/gpu_0/loss/ans_loss/SparseSoftmaxCrossEntropyWithLogits_1/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](towers/gpu_0/class/Linear_1/out1, towers/gpu_0/loss/ans_loss/Gather_2)]]

After going through the code, the answers placeholder is broken into 8 pieces, where each piece refers to a different part of answer here - https://github.com/uwnlp/qrn/blob/master/prepro-dialog.py#L232

So, we get logits for each part here separately as follows:

0 = {Tensor} Tensor("towers/gpu_0/class/Linear/out0:0", shape=(32, 15), dtype=float32, device=/device:GPU:0)
1 = {Tensor} Tensor("towers/gpu_0/class/Linear_1/out1:0", shape=(32, 10), dtype=float32, device=/device:GPU:0)
2 = {Tensor} Tensor("towers/gpu_0/class/Linear_2/out2:0", shape=(32, 10), dtype=float32, device=/device:GPU:0)
3 = {Tensor} Tensor("towers/gpu_0/class/Linear_3/out3:0", shape=(32, 4), dtype=float32, device=/device:GPU:0)
4 = {Tensor} Tensor("towers/gpu_0/class/Linear_4/out4:0", shape=(32, 3), dtype=float32, device=/device:GPU:0)
5 = {Tensor} Tensor("towers/gpu_0/class/Linear_5/out5:0", shape=(32, 674), dtype=float32, device=/device:GPU:0)
6 = {Tensor} Tensor("towers/gpu_0/class/Linear_6/out6:0", shape=(32, 645), dtype=float32, device=/device:GPU:0)
7 = {Tensor} Tensor("towers/gpu_0/class/Linear_7/out7:0", shape=(32, 2), dtype=float32, device=/device:GPU:0)

where the 2nd dimension refers to num_classes for that piece of the answer if/when applicable. The 2nd dimension matches the size of dict for various positions in the answers <class 'list'>: [15, 10, 10, 4, 3, 674, 645, 2], when pre-processing the dataset.

But, when I run the code, it throws the error mentioned above.

I'm using tensorflow 0.12.1 as 0.11 is deprecated now and there are no significant changes between the 2 releases as per - https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md#release-0120

seominjoon commented 5 years ago

Hi, It seems the issue is due to tensorflow incompatibility; I remember that 0.12 had issues. Could you try using 0.11? I cannot promise much about upgrading the code (for recent version compatibility) at this point :(