Closed xy-always closed 5 years ago
Hi, what do you mean by go wrong? Could you provide more details?
I looked into DatasetVectorizer class and it seems that i haven't implemented out-of-vocabulary token case.
Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call return fn(*args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn status, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 2427 is not in [0, 2426) [[Node: embeddings/Gather_1 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embeddings/word_embeddings/read, _arg_Placeholder_1_0_1)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 217, in
that's right, it looks like a bug related to OOV token - current version of code does not support OOV token.
ok, thanks.
one additional question, have you encountered this issue during the training time?
yes, when transform a sentence use def vectorize(self, sentence): a = np.array(list(self.vocabulary.transform([sentence]))) the OOV token index is not zero, i use a = np.array(list(self.vocabulary.fit_transform([sentence]))) and it works.
I found that if the sentence have the unknown word, this program will go wrong.